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ABSTRACT 

The Baryon Oscillation Spectroscopic Survey (BOSS), part of the Sloan Digital Sky Survey 
(SDSS) III project, has provided the largest survey of galaxy redshifts available to date, in 
terms of both the number of galaxy redshifts measured by a single survey, and the effective 
cosmological volume covered. Key to analysing the clustering of these data to provide cosmo¬ 
logical measurements is understanding the detailed properties of this sample. Potential issues 
include variations in the target catalogue caused by changes either in the targeting algorithm 
or properties of the data used, the pattern of spectroscopic observations, the spatial distribu¬ 
tion of targets for which redshifts were not obtained, and variations in the target sky density 
due to observational systematics. We document here the target selection algorithms used to 
create the galaxy samples that comprise BOSS. We also present the algorithms used to create 
large scale structure catalogues for the final Data Release (DR 12) samples and the associated 
random catalogues that quantify the survey mask. The algorithms are an evolution of those 
used by the BOSS team to construct catalogues from earlier data, and have been designed 
to accurately quantify the galaxy sample. The code used, designated MKSAMPLE, is released 
with this paper. 

Key words: cosmology: observations - (cosmology:) large-scale structure of Universe 


1 INTRODUCTION 

The size of galaxy redshift surveys has grown exponentially over 
the last decade and will continue do so into the next, thanks to 
the continuing development of instrumentation to undertake mulit- 
object spectroscopy (MOS) on dedicated telescopes. The scientific 
driver for this dramatic increase is that galaxy redshift surveys pro¬ 
vide a wealth of cosmological and extra-galactic information. The 
most easily accessible cosmological information is encoded in 2- 
point clustering statistics of the over-density field, which contain 
both the Baryon Acoustic Oscillation (BAO) and Redshift Space 
Distortion (RSD) signals. The BAO scale is a comoving large- 
scale enhancement in pairs of galaxies separated by ~150Mpc, 
which can be used to track cosmological expansion. It arises from 
the propagation of sound waves in the early Universe (Peebles & 
Yu 1970; Sunyaev c& Zel’dovich 1970; Doroshkevich et al. 1978), 
and is quite insensitive to astrophysical processing that occurs on 
smaller scales; thus BAO experiments are affected by a low level 
of systematics (see review by Weinberg et al. 2013 for a compar¬ 
ison of different methods). Redshift-Space Distortions arise from 
the peculiar velocities of galaxies within a comoving frame, which 
produce coherent distortions in the measured redshifts compared to 
those produced by the Hubble expansion (Kaiser 1987). As these 
velocities are gravitational in origin, the amplitude depends on the 
rate of structure growth, and hence RSD allow tests of General Rel¬ 
ativity (GR) on large scales. 

The BAO signature has now been detected in many different 
galaxy surveys and analysed using a variety of methods. To show 
the exponential growth in BAO measurements. Fig. 1 presents the 
predicted error on the BAO scale expected for different surveys, 
calculated as if the clustering signal from different directions was 
optimally combined to provide the best possible single BAO posi¬ 
tion measurement. We include results from various stages of the 2- 
degree-Field Galaxy Redshift Survey (2dFGRS; Colless et al. 2001, 
2003), Sloan Digital Sky Survey (SDSS; York et al. 2000) and Wig- 
gleZ (Drinkwater et al 2010), and predictions for the continuation 
of the SDSS project with eBOSS (Dawson et al. 2015). For consis¬ 
tency, all calculations used the code of Seo & Eisenstein (2007), ap- 
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proximating each survey as a single volume, limited in redshift and 
area, and sampled by a constant density of galaxies, with numbers 
approximately matching those of the actual surveys. Thus the re¬ 
sults themselves are not precise and are designed to simply demon¬ 
strate the evolution rather than provide a quantitative comparison 
between experiments. The best fit line shows the growth in the im¬ 
pact of past surveys, following the development of Multi-Object 
Spectrographs (MOS) on the Anglo-Australian telescope (Lewis et 
al. 1998) and the Sloan telescope (Gunn et al. 2006), which con¬ 
tinues to the next generation with a new MOS being developed 
for the Hobby-Eberly telescope (HETDEX; Hill et al. 2008), the 
Mayall telescope (DESI; Levi et al. 2013), the VISTA telescope 
(4MOST; de Jong et al. 2014), the William Herschel Telescope 
(WEAVE; Dalton et al. 2014), the Subaru telescope (PFS; Takada et 
al. 2014) and the satellite experiments Euclid (Laureijs et al. 2011) 
and WFIRST (Spergel et al. 2015) . For clarity we only plot ap¬ 
proximate DESI and Euclid predictions in Eig. 1 to show the gen¬ 
eral expected trend from these new instruments, as our simplified 
approach is insufficient to provide a careful differential analysis of 
these future projects. Also, there is significant uncertainty in the 
predictions for Euclid, as a consequence of our lack of knowledge 
about the galaxy population targeted: the prediction here uses the 
predicted volume and galaxy density of Laureijs et al. (2011). The 
higher redshift surveys of eBOSS and WiggleZ are inherently more 
difficult and consequently they lie above the line: they push into 
new redshift ranges, rather than to larger volumes. 

In this paper, we present the target selection and catalogue 
generation of the Data Release 12 (DR12; Alam et al. 2015) sam¬ 
ples of galaxies selected from the Baryon Oscillation Spectroscopic 
Survey (BOSS; Dawson et al. 2012), which is part of SDSS-III 
(Eisenstein et al. 2011). The spectroscopic sample has two primary 
catalogues: LOWZ at a < 0.4, and CMASS covering 0.4 < 2 ; < 
0.7 (see Section 3 for details). An overview of the BOSS observa¬ 
tions is provided in Section 2; see Dawson et al. (2012) for a full 
description of the survey. 

The work presented here follows on from the analysis of pre¬ 
vious data releases: DR12 is the third public SDSS data release 
containing BOSS spectroscopic results. The first was DR9 (Ahn et 
al. 2012), when the survey was approximately one third complete. 
The creation of the large-scale structure catalogues from these data 
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Publication year 


Figure 1. BAO measurement errors predicted for various surveys as a func¬ 
tion of the year of publication. In order to calculate these with a consis¬ 
tent methodology we plot “predictions” using the code of Seo & Eisenstein 
(2007) based on a single number of galaxies, and volume for each survey. 
The surveys plotted are 2dFGRS early (Percival et al. 2001) and final (Cole 
et al. 2005); SDSS-II LRGs (Eisenstein et al. 2005); WiggleZ (Blake et al. 
2011); BOSS DR9 CMASS (Anderson et al. 2012); BOSS DRll LOWZ 
(Tojeiro et al. 2014) and CMASS (Anderson et al. 2014b). In terms of sur¬ 
vey volume, BOSS DR12 is very close to DRl 1 and we do not show it here. 
We also present approximate predictions for the eBOSS, DESI and Euclid 
future surveys (see text for details). 

was outlined in Anderson et al. (2012), alongside the isotropic BAO 
results, with the anisotropic results following in Anderson et al. 
(2014a). The development of a method to remove potential system¬ 
atic errors in the clustering measurements caused by fluctuations in 
the target catalogue was presented in Ross et al. (2012). The DR9 
catalogues were used extensively for science, which also tested the 
catalogues themselves. The clustering was compared with simula¬ 
tions in Nuza et al. (2013), and with full model fits in Sanchez et 
al. (2012). RSD were measured by Reid et al. (2012) and enhanced 
by knowledge of passive galaxy evolution in Tojeiro et al. (2012): 
the resulting GR tests were presented in Samushia et al. (2013). 
Primordial non-Gaussianity was constrained by Ross et al. (2013), 
while Zhao et al. (2013) reported neutrino masses, and Scoccola 
et al. (2013) examined the time variation of physical constants. 
This work led to further refinements of the catalogue creation al¬ 
gorithm for the analysis of the second public BOSS data release 
DRIO (Ahn et al. 2014), which coincided with an internal release 
(called DRll). In particular, the code was rewritten into a modu¬ 
lar version, called MKSAMPLE, new weights were used to correct 
for fluctuations in the expected target density, and new masks were 
used for “bad” areas. These refinements were presented alongside 
the BAO results for the CMASS sample in Anderson et al. (2014b) 
and the LOWZ sample in Tojeiro et al. (2014), and were confirmed 
to be robust to colour (Ross et al. 2014) and against possible sys- 
tematics in the fit (Vargas-Magana et al. 2014). As for DR9, the 
results were extensively used, further testing the catalogues: RSD 
measurements have been made in a number of different ways (Beut- 
ler et al. 2014a; Samushia et al. 2014; Sanchez et al. 2014; Chuang 
et al. 2013), the bispectrum calculated and analysed (Gil-Marin et 
al. 2014a,b), and neutrino mass constraints presented (Beutler et 
al. 2014b). Saito et al. (2015) account for redshift dependent selec¬ 
tion effects and compare clustering and RSD with predictions from 
abundance matching. 

We have now analysed the final BOSS DR 12 galaxy sample 


using an algorithm that builds on the work described above. This 
paper on the targeting algorithm and catalogue creation method 
is complemented by a series of papers measuring and analysing 
clustering, splitting the BOSS galaxies into sub-samples delineated 
by the primary targeting algorithms LOWZ and CMASS samples 
(see Section 3 for details). BAO measurements are presented in 
configuration-space (Cuesta et al. 2015) and Fourier-space (Gil- 
Marin et al. 2015a), and RSD measurements made in Fourier-space 
are presented in Gil-Marin et al. (2015b). Two further support pa¬ 
pers are provided in this set: Ross et al. (2015) considers the BOSS 
selection function in more detail, presenting the observational foot¬ 
print, masks for image quality and Galactic extinction, and weights 
to account for density relationships intrinsic to the imaging and 
spectroscopic portions of the survey. Vargas-Magana et al. (2015) 
presents systematic tests on the reconstruction algorithm used for 
anisotropic BAO analyses. A subsequent set of analyses to be re¬ 
leased soon, will consider jointly analysing the full BOSS sample, 
without splitting by target selection. 

Because the key cosmological measurements depend on the 
density field, galaxy properties (except how they trace this field, 
commonly quantified by a linear deterministic bias b), are unimpor¬ 
tant once redshifts have been measured, and cosmological surveys 
are free to choose which galaxies to observe to optimise survey ef¬ 
ficiency and the optimal bias b. BOSS targets luminous galaxies 
for spectroscopic observations as they have a large bias, are rel¬ 
atively easy to target, and have strong spectral features that ease 
redshift determination. The target selection adopted by BOSS is an 
extension of the targeting algorithms for the SDSS-II (Eisenstein et 
al. 2001) and 2SLAQ (Cannon et al. 2006) Luminous Red Galax¬ 
ies (LRGs), targeting fainter and bluer galaxies in order to achieve 
the desired number density of ~ 3 x lO”"^ h^Mpc“®. The major¬ 
ity of the galaxies are old stellar systems whose prominent 4000 A 
break makes them relatively easy to target using multi-colour data. 
The data from which the samples are targeted is described in Sec¬ 
tion 2, and the LOWZ and CMASS target selection algorithms are 
discussed in detail in Section 3. 

In order to do large-scale structure analyses with the sample 
of spectroscopically observed galaxies, we have put together cat¬ 
alogues including information on the detailed angular and radial 
mask of the sample including the redshift completeness, the ob¬ 
serving conditions when the imaging and spectroscopic observa¬ 
tions were made, and the appropriate weights to give each object, as 
well as random (i.e., unclustered) catalogues with the same selec¬ 
tion function. These collectively make up the large-scale structure 
catalogues, whose contents are detailed in this paper. 

Key to creating these catalogues for the BOSS galaxy surveys 
is the ability to predict where we could have observed galaxies, as 
well as where galaxies exist, thus defining the survey or sample 
mask. This mask is intricately linked with the selection of galax¬ 
ies: in general, corrections for selection effects can be applied to 
either the mask or the galaxy sample to produce a match between 
the two. In order to understand the mask, we need to understand 
both the target sample and the subsequent spectroscopy and red¬ 
shift measurement, which we briefly summarise in Section 4. The 
BOSS galaxy mask is quantified using a random catalog, a Poisson 
sampling of the volume covered by the selected galaxies, includ¬ 
ing any variations in density other than the cosmological clustering 
signal we wish to measure. The 3D mask does not have to be quan¬ 
tified by a Poisson sampling, but this is a straightforward approach 
to this - in effect providing a Monte-Carlo sampling of the volume 
covered. This weighted random sample and the weighted galaxy 
sample form the starting point for the key BOSS galaxy cluster- 
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ing analyses. Section 5 presents the method adopted by the BOSS 
team to prepare catalogues of galaxies and randoms, using routines 
made publicly available in a code called MKSAMPLE. This is a fur¬ 
ther extension of the code used for the early DR9 analyses, which 
is described in Anderson et al. (2012). 

Although the targeting algorithm adopted for BOSS is 
isotropic and the catalogue of target objects covers an angular area 
larger than that of spectroscopic observations, the mask is compli¬ 
cated by various anisotropic effects including variations in imaging 
depth due to recalibration of the SDSS photometric scale and rere¬ 
duction of the imaging as the spectrosocpic survey progressed, vari¬ 
ation of seeing, variation with stellar density caused by occultation 
by stars, the inability to measure spectra for close to another target 
observed at the same time, and the failure to measure spectra as a 
function of signal-to-noise ratio in the spectrum. These effects are 
often corrected by applying a weight to the galaxies (e.g. Ross et al. 
2012), but could instead be incorporated into the mask. The quality 
of the DR 12 data is such that we can now observe systematic ef¬ 
fects that couple radial and angular fluctuations, and we introduce 
3D corrections for these. The manner adopted to deal with these 
effects for BOSS is described in Section 6. 

BOSS includes a number of galaxy catalogues with different 
selection functions, some of which spatially overlap. The combina¬ 
tion of these to optimally quantify the underlying matter overden¬ 
sity field is non-trivial, and we present the method adopted by the 
BOSS team in Section 7. 

The MKSAMPLE code will be released upon publication 
of this paper, and we will also publish the resulting Large- 
Scale Structure catalogues, with a full datamodel describing 
each. These will all be linked from the main SDSS web site 

http://WWW.sdss.org/surveys/boss . 

2 DATA 

2.1 Imaging Data 

The Sloan Digital Sky Survey (SDSS-I/II; York et al. 2000) im¬ 
aged approximately 7,606 deg^ of the Northern Galactic Hemi¬ 
sphere and 600 deg^ of the Southern Galactic Hemisphere in the 
ugriz bands (Fukugita et al. 1996; Smith et al. 2002; Doi et al. 

2010) , using a specially designed camera (Gunn et al. 1998) on the 
2.5m Sloan telescope (Gunn et al. 2006) at the Apache Point Ob¬ 
servatory in New Mexico. The SDSS-III project (Eisenstein et al. 

2011) obtained additional imaging to make the region of the South¬ 
ern Galactic Hemisphere contiguous, covering 3,172 deg^. As part 
of this effort, the original SDSS-I/II data and the SDSS-III data 
were reduced with the latest versions of the SDSS image process¬ 
ing and calibration pipelines (Lupton et al. 2001; Pier et al. 2003; 
Padmanabhan et al. 2008). These data were released as part of Data 
Release 8 (Aihara et al. 2011), and form the parent imaging cata¬ 
logue for the BOSS galaxy target selection. There are a number 
of differences between the processing performed for DR8 (see the 
DR8 paper Aihara et al. 2011 for a detailed discussion) and ear¬ 
lier reductions; reproducing BOSS galaxy samples derived from 
the imaging data requires using the appropriate algorithms. 

BOSS obtained spectra and redshifts for 1,372,737 galaxies 
over 9,376 deg^. The targets are assigned to tiles of diameter 3 deg, 
using a tiling algorithm that is adaptive to the density of targets 
on the sky (Blanton et al. 2003). Spectra are then obtained using 
the BOSS spectrographs (Smee et al. 2013). Each observation is 
performed in a series of 900 sec exposures, integrating until a min¬ 
imum signal-to-noise ratio is achieved for the faint galaxy targets. 


Redshifts are then measured using the methods described in Bolton 
et al. (2012). The spectroscopic observations were split into distinct 
areas of sky, which we call chunks, targeted separately and sequen¬ 
tially in time, each defined by a subset of the total footprint. Later 
chunks can overlap earlier chunks and recover unobserved targets. 
The angular distribution of chunks 2-11, which are special as they 
reflect early versions of the target selection (see Appendix A), but 
also serve to show how the survey is built up from chunks are shown 
in Fig. Al, and basic definitions for geometrical descriptors used in 
this paper are provided in Table 1. 

The start of spectroscopic observations preceded the finalisa¬ 
tion of the DR8 imaging reductions, so the imaging data used by 
BOSS are based on the photometric measurements available at the 
time of tiling (see Section 4.2), which may differ from the quan¬ 
tities available for an object in the DR8 catalog. BOSS targeting 
was performed using three different versions of the reduction soft¬ 
ware that resolves the catalogues from overlapping imaging data 
(RESOLVE; see Aihara et al. 2011). Chunks 1^ used a version of 
the RESOLVE software tagged on 14-06-2009, chunks 5-11 used 
a version tagged on 16-11-2009, and chunks 12 onwards used the 
same version as that used to produce DR8. In total, 17% of targets 
were targeted with pre-DR8 RESOLVE versions. Because these 
different versions of the software selected different imaging data to 
be designated as “primary” (i.e. either the only or the best obser¬ 
vation of this object; see the DR8 documentation for more details), 
approximately 9% of the imaging data used for targeting CMASS 
galaxies is now designated as secondary^ in the DR8 database. 

2.2 Parent catalog 

The selection of galaxy targets for spectroscopic observation is 
based on a parent catalogue of photometrically identified objects 
within the imaging data. The parent catalogue was based on ob¬ 
jects chosen from 3172deg^ in the Southern Galactic Cap (SGC) 
and 7606 deg^ in the Northern Galactic Cap (NGC), as described in 
this section. The SDSS imaging pipeline returns a number of dif¬ 
ferent measurements of the photometry of galaxies. Full descrip¬ 
tions may be found on the SDSS website^ and in Stoughton et al. 
(2002). For galaxy target selection we use three photometric mea¬ 
surements, which have all been corrected for Galactic extinction 
using the Schlegel, Finkbeiner & Davis (1998) dust maps. 

The colours of galaxies are based on SDSS model magnitudes 
(denoted by the subscript mod). These are determined by using the 
best-fit (psf-convolved) deVaucouleurs or exponential profile fit in 
the r band to determine the fluxes in the other bands (full details 
are provided in Abazajian et al. 2004). Cuts in apparent magni¬ 
tude are made with “cmodel” magnitudes (denoted by a subscript 
cmod). These are a linear combination of the flux from the best fit 
exponential and deVaucouleurs profile fit in each band separately® 

/cmod = (1 — P)fe^p + PfdeV , (1) 

where P is the best-fit coefficient obtained from a fit of the linear 
combination of the deVaucouleurs and exponential profile fits to the 
image, and weights the different contributions (reported as fracdev 
by the SDSS pipelines), and / represents the flux (not magnitude) 
assuming an exponential or deVaucouleur profile. Star-galaxy sep¬ 
aration compares the PSF magnitudes of galaxies (denoted by a 

^ i.e., there is an overlapping observation with higher quality photometry 
® http://www.sdss3.org/dr8 

® contrasted with model magnitudes where the fit in the r band is used 
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spherical polygon 

The base unit of a MANGLE mask. Spherical polygons are used to represent the boundaries of the imaging survey from which the 
targets are drawn, the circular fields defined by spectroscopic tiles, as well as regions to be removed from the survey footprint (e.g. 
the centerpost of each spectroscopic tile; see Sec. 5.1.1 for a full list). 

spectroscopic tile 

Output of the tiling algorithm providing a central location on the sky and list of targets to be observed for spectroscopic observations. 
Each tile has a circular field-of-view of radius 1.49 degrees, and can be observed by multiple plates. 

plate 

Physical plate with a hole drilled for each target, based on the anticipated airmass of observation. Spectroscopic tiles may be 
observed using multiple plates. 

chunk 

Basic unit of sky input to the tiling algorithm. It consists of a set of rectangles in a spherical coordinate system. The SDSS-III BOSS 
survey is composed of 38 chunks. 

sector 

The union of spherical polygons defined by a unique intersection of spectroscopic tiles. The survey completeness is treated as 
uniform within a sector. 


Table 1. Basic definitions for the geometric description of SDSS-III BOSS observations and the LSS MANGLE masks. 


subscript PSF) with model or cmodel magnitudes; PSF magnitudes 
underestimate the flux from extended sources compared with the 
model fits (see § 3.3.2 for details). Finally, we use “fiber2” (de¬ 
noted by subscript magnitudes to estimate the expected flux 
through the SDSS-III 2” fibres. 

The parent sample for the BOSS galaxy target selection is 
constructed by selecting all detected objects that the photometric 
pipeline classifies as galaxies, and that are chosen by RESOLVE 
to be “primary”. The targeting software uses the photometry of the 
primary objects to select targets for spectroscopic follow-up. The 
variation in selected targets from different imaging data is consis¬ 
tent with that expected given the photometric uncertainties, and so 
we treat the regions targeted with pre- and final DR8 imaging as 
statistically identical. We do not make any cuts on photometric- 
ity at this stage; unphotometric data is discarded at the catalogue 
creation stage (see § 5.1.1). Users constructing their own samples 
for science analyses are advised to use the CALIB .STATUS flag to 
cut on photometricity (restricting to photometric observations cor¬ 
responds to CALIB_STATUS==1). We cull objects with suspect 
photometry as reported in the flags set by the imaging pipeline. In 
particular, we require objects that are detected in the r and i bands. 
In the Image Processing pipeline, this is indicated by having one 
of the BINNED 1, BINNED2 or BINNED4 flags set in both the r 
and i bands. We also require that the OBJC_ELAG flag, which is a 
combination of the per-filter flags appropriate for the whole object 
(the full definition is provided in Stoughton et al. 2002) has 

(i) Objects not to be saturated : (NOT SATUR) OR (SATUR 
AND (NOT SATURXENTER) ) , 

(ii) Blended objects : (NOT BLENDED) OR (NOT 

NODEBLEND), 

(iii) Other photometric quality flags : (NOT BRIGHT) AND 
(NOT TOOAIANY_PEAKS) AND (NOT PEAKCENTER) AND 
(NOT NOTCHECKED) AND (NOT NOPROFILE). 


3 TARGET SELECTION 

We now turn to the specifics of the target selection algorithms used 
to define the BOSS spectroscopic galaxy samples. We first summa¬ 
rize the criteria that we wish our algorithm to satisfy (§ 3.1), with 
the aim of defining a uniformly selected sample over a broad red- 
shift range. The galaxy sample is targeted using two different algo¬ 
rithms, which we term “LOWZ” (detailed in § 3.2) and “CMASS” 
(for “Constant (stellar) Mass”, § 3.3), respectively. Star-galaxy sep¬ 
aration is treated differently in the CMASS sample than elsewhere 
in SDSS, as we describe in § 3.3.2. A variant of the CMASS al¬ 
gorithm was used to explore the colour boundaries of the sample 
(§ 3.4). 


3.1 Requirements and Criteria 

The BOSS sample was designed to measure the BAO signature in 
the two-point galaxy clustering signal, and in particular to meet er¬ 
ror requirements on the measurement of the angular diameter dis¬ 
tance dA and Hubble parameter H at z = 0.35 and a = 0.6. These 
requirements can be met by a survey covering an area of approx¬ 
imately 10,000 deg^ with a comoving number density of galaxies 
of 3 X 10“'* h®Mpc“® for 0.1 < 2 < 0.6. This density is close to 
optimal for large-scale cosmological studies (e.g., Kaiser 1986). To 
efficiently undertake such a survey using the Sloan telescope and 
spectrographs, we need to select a sub-sample of the parent cata¬ 
logue of photometrically identified objects that fulfil the following 
criteria simultaneously: 

(i) galaxies that lie in the desired redshift range 0.1 < 2 < 0.6, 

(ii) sufficient galaxies to meet the desired density over the full 
redshift range, 

(iii) well-defined limits in stellar populations, to isolate a 
strongly clustered subsample of galaxies, 

(iv) redshifts that can be measured in a relatively short exposure 
with our telescope, 

(v) few contaminating objects that are not part of the desired 
sample, 

(vi) selectable uniformly across the desired area, 

(vii) selection is not sensitive to systematic errors in the data 
used. 

The challenge of target selection is to provide an algorithm for 
selecting the subsample of the parent imaging catalogue that op¬ 
timally meets these goals. Selection based solely on an appar¬ 
ent magnitude cut, as used for the SDSS-I and -II Main Galaxy 
Sample (Strauss et al. 2002) in general selects too many low red¬ 
shift and low luminosity galaxies. Rather, in BOSS we follow 
a similar philosophy to the selection of Luminous Red Galaxies 
(LRGs) in SDSS-I and -II (Eisenstein et al. 2001) and the 2SLAQ 
survey (Cannon et al. 2006) using colour-magnitude and colour- 
colour cuts, selecting luminous galaxies with strong spectral fea¬ 
tures (item (iv) above). 

At redshifts 2 < 0.4, we can select such a sample by extend¬ 
ing to fainter LRGs than observed in SDSS-I and -II. At higher 
redshifts, we do not restrict ourselves to red galaxies, and instead 
select an approximately stellar mass-limited sample of objects of all 
intrinsic colours. As in Eisenstein et al. (2001), two sets of colours 
are necessary to describe the colour locus: one when the 4000 A 
break lies in the SDSS g-band, and the other when it redshifts into 
the r-band at 2 ~ 0.4. Selecting these two subsamples requires 
defining fiducial colours that track the locus of a passively evolving 
population of galaxies in gri colour space. Following Eisenstein et 
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Figure 2. Top panel: Black dots show median C|| for LOWZ spectroscopi¬ 
cally confirmed galaxies as a function of measured redshift, with the dashed 
lines showing the interquartile range. The efficiency of using this quantity 
to track redshift is clear. Bottom panel: Median r^iod ~ *mod a function 
of redshift for confirmed CMASS galaxies, wifh interquartile range (dashed 
lines). The way in which we can track the high-redshift locus of galaxies 
using this colour, and select as a function of redshift, is clear. 
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Figure 3. Density plot of LOWZ galaxies in the (g — r,r — i) colour plane; 
red corresponds to higher density and dark blue to lower density, in an ar¬ 
bitrary normalisation and linear scale. Redshift increases rightwards and 
upwards along the galaxies locus, starting at 2 ~ 0.1 on the bottom left 
comer. The knee on the galaxy locus is caused by the 4000 A break tran¬ 
sitioning between the g— and r-band filters, and happens at 2 Ri 0.4. The 
colours and C| | are simple rotations of this colour plane, and trace the 
position of a target in parallel and perpendicular, respectively, to the data 
locus. The black thick line represents the passively evolving LRG model of 
Maraston et al. (2009). The green and red dashed lines are the colour and 
magnitude targeting cuts - see the main text for details. The few targets seen 
outside of the selection cut are due to differences in the targeting and final 
photomefry, see Section 2. 



al. (2001) and Cannon et al. (2006), we define 

C|| = 0.7(^mod t'niod) 1.2(rinod ^mod 0.18) (2) 

C_L = (fmod ^mod) (f7mod t‘mod)/4.0 0.18 (3) 

to describe the low redshift locus and 

— (fmod ^mod) (flniod t'mod)/8 , (4) 

to describe the high-redshift locus. As discussed above, the colours 
are defined using SDSS model magnitudes, and are corrected for 
Milky Way extinction. The efficiency of these selections to se¬ 
lect luminous galaxies as a function of redshift is demonstrated in 
Fig. 2, which shows how C|| and Tmod — *mod versus redshift for 
observed BOSS galaxies. 

Where the targeting algorithms use colour selection, they are 
built on model magnitudes, which are based on the flux measured 
through equivalent apertures in all band and thus provide unbiased 
colours of galaxies. Brightness limits are instead based on cmodel 
magnitudes, which provide better estimates of the total light ob¬ 
served. 

3.2 The LOWZ sample 

The LOWZ sample is designed to extend the SDSS-I/II Cut I LRG 
sample (Eisenstein et al. 2001) to 2 ~ 0.4 to fainter luminosities, 
in order to increase the number density of the sample by roughly a 
factor of 3. Fig. 3 shows how the colours C|| and c± describe the 
evolution of a passively evolving stellar population with redshift. 
Redshift increases from the bottom left to upper right. The black 
line shows the passively evolving LRG model of Maraston et al. 
(2009). The Maraston et al. (2009) ’LRG’ template is a model of a 
metal-rich population in passive evolution containing a small frac¬ 


tion of a metal-poor coeval population. This model was found to 
be a good fit to the g,r,i colours of luminous red galaxies (LRGs) 
from the 2SLAQ survey (Cannon et al. 2006) as a function of red¬ 
shift, over models containing star formation in various amount. The 
same model also better fit the overall luminosity evolution of BOSS 
galaxies (Montero-Dorta et al. 2015). The knee seen in the galaxy 
locus corresponds to the transition of the 4000 A break from the 
g— to the r— band. The parameter C|| quantifies the position of a 
galaxy along the main locus, and c± characterises the departure of 
a galaxy from the centre of the locus; cx = 0 lies approximately at 
the centre of the galaxy distribution. 

We select targets at low redshift (z < 0.4) around the pre¬ 
dicted colour locus using 

|cx|<0.2, (5) 

(red dashed lines in Fig. 3) and we select the brightest and reddest 
objects at each redshift using a sliding colour-magnitude cut with 
c± (an effective proxy for a photometric redshift): 

^cmod < 13.5 + C||/0.3. (6) 

The dashed green lines in Fig. 3 show the effective cuts in c\\ for 
three different r— band magnitudes: r = 16,18.73 and 19.6 mag 
corresponding to the faint boundary, the median magnitude and 
the bright boundary of the sample respectively. Thus fainter ob¬ 
jects must be redder to pass the cut. This cut is the most important 
criterion in the selection of LOWZ galaxies - it drives the number 
density of the sample by effectively setting the magnitude limit as a 
function of redshift, and aims to produce a constant number density 
over the desired redshift range. The number of galaxies in the sam¬ 
ple is therefore highly sensitive to this cut (see Ross et al. 2012; 
Tojeiro et al. 2014). The resulting space density of the sample is 
shown in Fig. 11; the sample is close to volume-limited (constant 
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space density at ~ 3 x 10 h? Mpc over the redshift range 
0.2 < z < 0.4. 

We impose brightness limits on the targets, such that 

16 < rcmod < 19.6. (7) 

The faint limit ensures a high redshift success rate. The bright 
limit excludes a significant number of low-redshift blue galaxies 
that would otherwise pass the colour cut, but also excludes a frac¬ 
tion of brightest cluster galaxies in low-redshift massive clusters 
(Hoshino et al. 2015). A bright cut was not needed in SDSS-I/II as 
such galaxies were already targeted by the SDSS-I/II Main Galaxy 
Sample (Strauss et al. 2002), but a significant fraction of the BOSS 
footprint lies outside that of SDSS-I and -II. 

The star-galaxy separation follows the same procedure intro¬ 
duced in Eisenstein et al. (2001) for the LRGs, 

fpsf r*cmod ^ 0.3. (8) 

The cmodel magnitude is a proxy for a “total” magnitude for a 
galaxy, while the PSF magnitude fits the unresolved component of 
the object. The difference between the two is therefore a measure 
of the extendedness of the galaxy. 

In summary, the LOWZ selection algorithm, as implemented 
after commissioning, is as follows: 


^ cmod 

< 

13.5-bC| 1 / 0.3 

(9) 

kxl 

< 

0.2 

(10) 

16 

^cmod 

19.6 

(11) 

'^psf '^cmod 

> 

p 

CO 

(12) 


The galaxies in the LOWZ sample may be selected from the 
DR 12 database using the following flags, whose definitions can be 
found on the SDSS website'*: 


• BOSS_TARGETl && 2° Objects targeted by the LOWZ 
algorithm. 

• SPECPRIMARY == 1 Objects with spectra, removing 
duplicate observations. 

• ZWARNINGJSTOQSO == 0 Objects whose spectro¬ 
scopic redshifts are cleanly measured. 

• CLASS J40QSO == ’GALAXY’ Objects whose spectra 
are those of a galaxy (as opposed to a quasar or star). 

The basic properties of the LOWZ sample are presented in 
Parejko et al. (2013), who fitted the small-scale clustering of the 
galaxies using halo occupation distribution (HOD) modelling. They 
demonstrated that these galaxies lie in massive haloes, with a mean 
halo mass of 5.2 x 10*® /i“*M 0 , a large-scale bias of ~ 2.0 and a 
satellite fraction of 12 ±2%. These galaxies occupy haloes with av¬ 
erage masses between those of the CMASS sample and the original 
SDSS I/IILRG sample. 

3.2.1 Exceptions to the LOWZ targeting 

During the first nine months of BOSS observations, the incorrect 
star-galaxy separation criterion was used to identify LOWZ targets, 
removing a significant fraction of galaxies (see Appendix A). To se¬ 
lect a uniformly-targeted sample from all LOWZ redshifts, with the 
selection criteria described in this section, the simplest procedure 
is to avoid those data with the use of an additional cut 

• TILEID^ 10324, 

■* http : //www . sdss , org 


where TILEID identifies spectroscopic tiles, and this cut corre¬ 
sponds to chunk numbers larger than 6 . 

Further details on this issue, and other slight changes in the 
targeting of LOWZ galaxies in early chunks, can be found in Ap¬ 
pendix A. Briefly, LOWZ targets in chunk 2, and LOWZ targets in 
chunks 3-6, were selected with different algorithms from those of 
subsequent data. For the purposes of a large-scale structure catalog, 
in previous data releases we simply removed chunks 2-6 from the 
LOWZ sample and the corresponding mask. In § 7 we construct 
separate samples using the chunk 2 (“LOWZE2”) and chunk 3-6 
(“LOWZE3”) selections, and combine all three LOWZ catalogues 
with the CMASS samples to construct a single unified sample ap¬ 
propriate for analyses restricted to large scales, such as BAO fitting. 
The effects of these changes on the density of galaxies measured as 
a function of redshift can be seen in Fig. 11. 

3.3 The CMASS sample 

The CMASS sample uses similar selection cuts to those utilised 
by the Cut-II LRGs from SDSS-I/II and the LRGs in 2SLAQ, but 
extends them both bluer and fainter in order to increase the number 
density of targets in the redshift range 0.4 < « < 0.7 and get closer 
to a mass limited sample. 

The quantity d± (Fig. 4) effectively discards low-redshift 
galaxies by choosing 

dx > 0.55. (13) 

We do not apply any further colour cuts, with the exception of a 
sliding colour-magnitude cut that selects the brightest objects at 
each redshift, in such a way as to keep an approximately constant 
stellar mass limit over the redshift range of CMASS according to 
the passively evolving model of Maraston et al. (2009): 

icmod < mm(19.86 -I- 1.0{dx — 0.8), 19.9) . (14) 

This approach is a significant departure from SDSS-I/II Cut-II 
and 2SLAQ LRGs - which consisted of essentially a flux-limited 
sample with a colour cut to isolate the reddest galaxies. 

We impose model and magnitude limits as follows: 

17.5 < icmod < 19.9 , (15) 

iflb 2 < 21.5 . (16) 

(17) 

The faint magnitude limits are set to ensure a high redshift success 
rate, whereas the bright limit protects against some low-redshift 
interlopers. In the first 14 tiling chunks, CMASS objects were tar¬ 
geted with iflb 2 < 21.7, but the redshift failure rate at the faint end 
of this range was quite poor, so we revised this limit to the final 
value of iflb 2 < 21.5. 

To exclude outliers with problematic deblending, we further 
impose the following cuts on colour and rdev.i (the effective radius 
in the fit to the deVaucouleurs profile for the i-band magnitude, 
measured in pixels): 

t*mod /mod <1. 2 (18) 

Tdev.i < 20.0pix. (19) 

These cuts remove a very small fraction of targets. The CMASS 
star-galaxy separation is described in detail in the next section. 

CMASS galaxies can be selected from the DR12 database us¬ 
ing the following flags: 

• BOSS_TARGETl && 2* 
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Figure 4. Both panels show density plots of CMASS galaxies; red cor¬ 
responds to higher density and dark blue to lower density in an arbitrary 
normalisation and linear scale. The black thick line shows the passively 
evolving LRG model of Maraston et al. (2009). Top: redshift increases up¬ 
wards, starting at 2 ~ 0.4 at dx = 0.55. Bottom: the sliding cut in dx 
with f—band magnitude, designed to select an approximately stellar-mass 
complete sample. Stellar mass increases with the perpendicular distance to 
the sliding cut, represented here by the red dashed line - see Maraston et al. 
(2013) for details. The green dashed line shows the sliding cut adopted for 
the CMASS SPARSE sample (see Section 3.4). Vertical solid lines show 
the magnitude limits. On both panels, the small fraction of targets that lie 
outside of the selection cut are due to differences in the targeting and final 
photometry, see Section 2. Only chunks greater than 6 are shown. 


• SPECPRIMARY == 1 

• ZWARNING_NOQSO == 0 

• CLASSAJOQSO ==’GALAXY’ 

The basic clustering properties of the CMASS sample are pre¬ 
sented in White et al. (2011), which fitted the small-scale cluster¬ 
ing of the galaxies using HOD modelling. They showed that these 
galaxies lie in massive haloes, with a mean halo mass of 2.6 x 10^® 
/i“^M0, a large-scale bias of ~ 2.0 and a satellite fraction of 10%. 
These galaxies occupy haloes with lower masses than those of the 
LOWZ sample, although the bias is similar, a consequence of them 
being at higher redshift. 

CMASS galaxies are massive, with M* > 10^^M© (e.g. 
Chen et al. 2012; Maraston et al. 2013), and the majority are 
dominated by old stellar populations with low star-formation rates 
(e.g. Chen et al. 2012; Thomas et al. 2013; Tojeiro et al. 2012). 
Maraston et al. (2013) argues that the CMASS sample becomes 


significantly incomplete at stellar masses M* < and 

z > 0.6 for a Kroupa initial mass function, and is roughly consis¬ 
tent with a volume-limited sample at higher masses and lower red¬ 
shift. Thomas et al. (2013) presented similar results, showing that 
stellar velocity dispersions of BOSS galaxies peak at ~240kms^ 
with a narrow distribution virtually independent of redshift. Most 
recently, Leauthaud et al. (2015) quantified the stellar mass com¬ 
pleteness of CMASS and LOWZ using data from the Stripe 82 re¬ 
gion of sky along the celestial equator - a narrow, but deeper subset 
of the SDSS imaging survey region, that is 2 magnitudes deeper 
than the single epoch SDSS imaging (Annis et al. 2014). Using the 
Stripe 82 Massive Galaxy Catalog (Bundy et al. 2015), they esti¬ 
mate that CMASS is 80% complete at log^gjATt/MQ) ^ 11.6 
in the redshift range 2 = [0.51, 0.61]. The stellar mass complete¬ 
ness of CMASS decreases at lower and higher redshifts and the de¬ 
nomination “constant mass” should be considered only as a loose 
approximation outside of the redshift window 2 = [0.51,0.61]. 
However, the combination of LOWZ and CMASS yields a spec¬ 
troscopic sample that is 80% complete at log]^Q(M*/MQ) ^ 11.6 
at 2 < 0.61. Compared to cut-II LRGs, CMASS galaxies have 
a larger range of properties including morphology (Masters et al. 
2011), star-formation rates (Thomas et al. 2013; Chen et al. 2012) 
and star-formation histories (Tojeiro et al. 2012), partly because 
no red cut has been imposed on the g-r observed-frame colour. It 
should be noted however that, for example, galaxies with detectable 
emission-lines (hence hosting very young stellar populations) still 
represent only 4% of the sample (see Thomas et al. 2013). 


3.3.1 Exceptions in CMASS targeting flag 

The meaning of BOSS_TARGETl && 2^ (the CMASS targeting 
flag) evolved during the first 14 chunks of the survey. Therefore 
BOSS_TARGETl && 2^ will not select CMASS galaxies (as de¬ 
fined by the equations in the previous sections) in these regions, and 
further subsampling is required based on galaxy colours and mag¬ 
nitudes to recover the final selection in these regions. Alternatively, 
these chunks can be explicitly excluded. Eor the first 14 chunks the 
following exceptions should be noted: 

• Chunks 1 & 2: The data taken in the commissioning phase 
(chunks 1 & 2) used a significantly broader selection criteria (see 
Section 3.3.3), and therefore must be dealt with carefully. 

• Chunks 3-6: The data taken in chunks 3-6 used a slightly 
looser icmod cut, selecting instead on icmod < 19.92 + 1.6(dx — 
0 . 8 ). 

• Chunks 1-14: As mentioned above, the cut in ifib2 changed 
during the survey. In chunks 1-14 the targeting required fflb2 < 
21.7. 

With the exception of chunk 1, all of these chunks are included in 
the LSS catalogue after applying the required subsampling based 
on colours and/or magnitudes. 


3.3.2 Star-Galaxy Separation in the CMASS sample 

The difference between psf and model magnitudes is a measure 
of the extendedness of a source, thus making it useful to separate 
stars from galaxies. For the commissioning phase of the survey we 
applied a star-galaxy separation criterion identical to that used in 
the 2SLAQ survey (Cannon et al. 2006), a sloping cut in ipsf—/mod: 

/prf - /mod > 0.2 + 0.2(20 - /mod) . (20) 
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Figure 5. The distribution of spectroscopically confirmed stars (large blue 
points) and galaxies (small red points) in the psf-model vs model i-band 
(top) and ^-band (bottom) planes selected in the CMASS sample of the 
commissioning data. The black lines are the linear cuts that remove the 
most spectroscopically confirmed stars whilst removing less than 1 % of the 
galaxies. The z—band cut was added to the original i—band cut targeting 
from chunk 3 onwards. 

Whilst this cut is effective at removing the bulk of the stars, roughly 
6.9% of ~7000 CMASS targets from the commissioning runs had 
stellar spectra, mainly cool M-dwarfs. 

Fig. 5 displays the distribution of the spectroscopically classi¬ 
fied stars (blue) and galaxies (red) in the ip^f — imod vs imod and 
2psf — Zmod vs Zmod planes for these commissioning targets. As ex¬ 
pected, the stars preferentially occupy lower values of psf—model 
than the galaxies, and so applying a more restrictive cut would re¬ 
move more stars but at the expense of removing some galaxies. For 
a maximum loss of just 1% of galaxies we found the linear cuts 
that would remove the largest numbers of stars. These cuts, shown 
as the black lines in Figure 5, remove 31% and 52% of the stars 
that remained in the commissioning data for the i— and 2—band 
cuts respectively. Since the 2—band cut, 

2 psf - 2mod > 9.125 - 0.462niod, (21) 

performed significantly better, it was added to the original i—band 
cut (Eq. 20) for all data from Chunk 3 onwards (i.e., after the com¬ 
missioning runs), such that targets have to pass both cuts to be se¬ 
lected. Even though the 2—band cut alone removes the vast major¬ 
ity of the stars excluded by the i— band cut, we kept the i— band cut 
in place to ensure that we could apply a consistent star-galaxy sep¬ 
aration throughout the survey. This is achieved simply by retroac¬ 
tively applying the 2—band cut to the commissioning data. 

Since these star-galaxy separation criteria measure the com¬ 
pactness of the objects in the SDSS imaging, their effectiveness 
will depend on the imaging PSF. Based on the commissioning data. 



1 1.5 3 


r psf fwhm ('') 

Figure 6. The dependence of the star galaxy separation on the FWHM of 
the imaging PSF. The top panel shows the fraction, and the bottom panel 
the number, of spectroscopically classified stars and galaxies in the com¬ 
missioning data that are excluded by the additional 2 —hand star-galaxy 
separation as functions of r—band FWHM. 


Fig. 6 shows how the fraction of stars and galaxies removed by the 
new 2—band criteria depends on the r—band PSF. The fraction of 
galaxies removed is fairly flat at ~ 1% for PSF FWHM < 1.5” and 
then rapidly increases at higher FWHMs. The fraction of stars re¬ 
moved displays the opposite trend. Fig. 6 also presents the numbers 
of stars and galaxies as a function of r—band FWHM, demonstrat¬ 
ing that the vast majority of the sample is selected from imaging 
with FWHM <1.5”. This slight seeing-dependent star-galaxy sep¬ 
aration will result in the imprint of a spatial dependence in the den¬ 
sity of galaxies across the survey, which can be corrected using 
seeing-dependent weights (see 6.4 for details). 

Whilst the above analysis addresses the fraction of galax¬ 
ies lost due to the addition of the 2—band star galaxy separation 
criteria, it provides no indication of how many compact galaxies 
were removed by the original i—band cut. To investigate this issue 
we combined the deep coadded SDSS Stripe82 imaging (Abaza- 
jian et al. 2009; Annis et al. 2014) with near-infrared J and K— 
band imaging from the UKIDSS Large Area Survey Data Release 
4 (Lawrence et al. 2007; Casali et al. 2007; Hewett et al. 2006; 
Hambly et al. 2008) in order to define a robust set of stars and 
galaxies over an area of 150 deg^. A J — K < 1.1 colour cut 
provides an excellent separation between stars and galaxies in the 
colour-magnitude region that the CMASS galaxies occupy. When 
this information is combined with the higher S/N measurement of 
2:psf — -Zmod from the coadded imaging we can confidently separate 
stars and galaxies. Using these data we estimate that the final star- 
galaxy separation cuts removes 2.3% of the full sample of galaxies 
selected by the CMASS colour cuts. 
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3.3.3 Summary of CMASS target selection 

In summary, the CMASS target selection for the bulk of the survey 


is as follows: 




^cmod 

< 

19.86-f 1.6(dx -0.8) 

(22) 

17.5 

^ tcmod ^ 

19.9 

(23) 

d_L 

> 

0.55 

(24) 

^psf ^-mod 

> 

0 .2-f 0.2(20 - imod) 

(25) 

•^-psf '^mod 

> 

9.125 - 0.462mod 

(26) 

^mod ^mod 

< 

2 

(27) 

^fib2 

< 

21.5 

(28) 

^ dev,i 

< 

20.0 pix . 

(29) 

During 

significantly 

commissioning (chunks 1 and 2), we used 
looser criteria; the CMASS_COMM sample 


(BOSS_TARGET&&2^), just under 25000 galaxies, was se¬ 


lected as follows: 


^cmod 

< 

20.14-f 1.6(dx -0.8) 

(30) 

17.5 

^ tcmod ^ 

20.0 

(31) 

d_L 

> 

0.55 

(32) 

^psf ^mod 

> 

0 .2-f 0.2(20 - i,„od) 

(33) 

'^mod ^-mod 

< 

2 

(34) 

^fib2 

< 

22 

(35) 


(36) 


See other exceptions to these criteria in Section 3.3.1. 


3.4 Sparse Sampling Cuts 

Motivated by the wish to study objects of slightly lower stellar mass 
and bluer intrinsic colour, we designed the CMASS_SPARSE sam¬ 
ple. It extends the CMASS selection by altering the icmod-dx slid¬ 
ing colour-magnitude cut to 

icmod ^ 19.86-f 1.6(dx-0.8) (37) 

fcmod < 20.14-f 1.6(dx-0.8) , (38) 

with the other cuts unchanged (i.e., the area between the red and 
green dashed lines in the bottom panel of Fig. 4). These galaxies 
were randomly subsampled down to a number density on the sky 
of 5deg~^, corresponding to approximately 1 in 10 targets. This 
sample was selected across the full BOSS footprint. 
CMASS-SPARSE galaxies may be selected with 

• BOSS_TARGETl && 2® 

• SPECPRIMARY == 1 

• ZWARNING-NOQSO == 0 

• CLASS_NOQSO ==’GALAXY’ 

after excluding the commissioning chunks. 

Altering the CMASS target selection in this way produces a 
sample of galaxies at somewhat lower redshift and stellar mass. The 
median redshift of CMASS_SPARSE is 2 = 0.51, with a stellar 
mass distribution that peaks at 10 ^^'^ Mq (using the stellar masses 
of Chen et al. 2012), relative to the peak CMASS mass of 10^^ '* 
Mq. 


4 SPECTROSCOPIC OBSERVATIONS 
4.1 Previously kuowu redshifts 

Fractions of the LOWZ and CMASS targets have a previous ro¬ 
bust object classification and redshift determined from the SDSS- 
II survey (York et al. 2000; Abazajian et al. 2009). We therefore 
matched our target sample to a sample of “known objects” with pre¬ 
determined secure classifications and redshifts and did not spec¬ 
troscopically reobserve these galaxies within BOSS. This subsam¬ 
ple of targets has a complicated angular distrihution on the sky: 
the majority of the NGC was covered by SDSS-II, but only a few 
stripes in the SGC were observed. These pre-observed targets ac¬ 
count for 43% (9%) of the LOWZ targets in the north (south). A 
much smaller fraction of CMASS targets were pre-observed: 1.7% 
(0.7%) in the N (S). 


4.2 Target Collatiou and Spectroscopic Tiling 

We start with the list of targets provided by the target selection al¬ 
gorithms detailed above, and remove targets with known redshifts 
as defined above. The tiling algorithm assigns the remaining targets 
to spectroscopic tiles. The sky was tiled in a piecemeal fashion as 
the survey progressed; each of these regions is called a “chunk”; 
see § 2.1 and Dawson et al. (2012) for further details. DR12 con¬ 
tains observations from 38 chunks. The survey mask and collated 
target catalogue both indicate the chunk to which a region or spe¬ 
cific object was assigned. 

The tiling algorithm (Blanton et al. 2003) determines the loca¬ 
tion of the 3° diameter spectroscopic tiles and allocates the avail¬ 
able fibres among the targets, including targets from other pro¬ 
grammes within BOSS. Because of the size of the cladding on the 
fibres, fibres may not lie within 62” of one another on a given spec¬ 
troscopic tile. The algorithm therefore divides target galaxies into 
friends-of-friends groups with a linking length of 62", and then as¬ 
signs fibres to the groups in a way that maximizes the number of 
targets with fibres. The choice of which galaxies are assigned fibres 
is otherwise random. The algorithm adapts to the density of targets 
on the sky, such that regions with a larger than average number 
density tend to be covered by more than one tile. For the DR 12 
sample, 42% (55%) of the area in the north (south) is covered by 
multiple tiles, and the number density of CMASS targets is larger 
by 4.7% (3.4%) in those regions. The tile overlap - target density 
correlation is less pronounced for the LOWZ sample (1.6% and 
2.4% enhancement in north and south, respectively). The LOWZ 
sample constitutes only 35% of the galaxy targets, and particularly 
in the north many galaxies in dense regions already have spectra 
from the SDSS-II and thus were not targeted for SDSS-III BOSS 
spectroscopy (see Sec. 4.1). 

Fibre collisions are partially resolved only in the multiple 
tile regions, and therefore may not be representative of the un¬ 
resolved fibre collisions in lower target density regions. Fibre- 
collided galaxies cannot simply be accounted for by reducing the 
completeness of their sector, since they are a non-random subset 
of targets (conditioned to have another target within 62"). As dis¬ 
cussed further in Sec. 6.1, we provide a set of weights that treat 
these objects as if they were observed, and assign their weight to 
the nearest object of the same target class. Finally, since quasar tar¬ 
gets are given higher priority by the tiling algorithm, we account for 
their presence by simply including a 62" veto mask (see Sec. 5.1.1) 
around each high priority quasar target. 
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Figure 7. Normalised 1^)32 distributions of redshift failures (green, dashed) 
and redshift successes (red, solid) for the CMASS sample. Redshift failures 
constitute 1.8% of the CMASS targets observed hy SDSS-III BOSS. These 
are contrasted against normalised distributions for the LOWZ sample of 
redshift failures (pink, dotted) and redshift successes (blue, dashed). Error 
bars were calculated assuming Poisson statistics. Note that some LOWZ 
galaxies have < 19.5, which is why the normalisation for LOWZ 
curves looks lower than for CMASS. 

4.3 Spectroscopic Reductions 

Each “tile” output from the tiling algorithm specifies a central loca¬ 
tion on the sky and the list of targets to be observed. Physical plates 
are drilled at the University of Washington based on the anticipated 
airmass of observation. Multiple plates can cover the same tile, and 
plates may be observed on multiple nights until the desired signal- 
to-noise ratio is reached (Dawson et al. 2012). 

The BOSS spectroscopic reduction pipeline is detailed in 
Bolton et al. (2012), with minor updates given in Alam et al. (2015). 

The final DR12 catalogues used the v5_7_0 tag of the IDLSPEC2D 
software package® for spectroscopic calibration, extraction, classi¬ 
fication, and redshift analysis. We restrict the large-scale structure 
catalogues to only include data from plates with PLATEQUALITY 
set to “good.” The criteria for this designation are a minimum of 
three exposures, the number of spectroscopic pixels flagged as bad 
must be less than 10%, and a minimum signal-to-noise ratio re¬ 
quirement for both the blue and red arms of the spectrograph must 
be met (Dawson et al. 2012). 

The classification and redshift of each object are determined 
by a Maximum Likelihood fit of the coadded spectra to a linear 
combination of redshifted eigenspectra in combination with a low- 
order polynomial. The polynomial (quadratic for galaxies, quasars, 
and cataclysmic variable stars; cubic for all other stars) allows for 
residual extinction effects or broadband continua not otherwise de¬ 
scribed by the templates. The templates are derived from a rest- 
frame principal-component analysis (PCA) of training samples of 
galaxies, quasars and stars using stellar population templates at the 
BOSS resolution (from Maraston et al. 2013). The reduced ''Sf" 
sus redshift is measured in redshift steps corresponding to the log¬ 
arithmic pixel scale of the spectra, where AlogjQ(A) = 0.0001. 
Galaxy templates are fit from 2 = —0.01 to 1.00, quasar templates 
from 2 = 0.0033 to 7.00, and star templates from 2 = —0.004 to 
0.004 (±1200 kms“^). The template fit with the best reduced is 
selected as the classification and redshift, with warning flags set for 

® http://WWW.sdss3.org/svn/repo/idlspec2d/tags/v5_7_0/ 


poor wavelength coverage, broken/dropped and sky-target fibres, 
and best fits which are within Ax^/dof = 0.01 of the next best 
fit (comparing only to fits with a velocity difference of more than 
1000 kms“^). This method is a development of that used for the 
SDSS DR8 (Aihara et al. 2011), and is explained in further detail 
in Bolton et al. (2012), and in Ahn et al. (2012, 2014). 

For galaxy targets, a dominant source of false identifications 
is due to quasar templates with unphysical fit parameters, e.g., large 
negative amplitudes causing a quasar template emission feature to 
fit a galaxy absorption feature. Thus, for galaxy targets, the best 
classification and redshift are selected only from the fits to galaxy 
and star templates, and we restrict the sample to fits the pipeline 
classifies as robust. The results of these fits are tabulated in the 
“* JNOQSO” versions of various quantities in the LSS catalogues. 

Table 2 lists the total number of CMASS and LOWZ targets 
that were assigned a fibre within the survey footprint (iVobs) as 
well as the breakdown for each of the three possible outcomes: the 
number of CMASS and LOWZ targets robustly classified as stars 
(Watar) Or galaxies (iVgai), and the number of targets for which the 
pipeline failed to find a robust classification and redshift (iVfaii). A 
total of 2.3% (3.4%) of CMASS targets are stars and 1.6% (2.1%) 
are redshift failures in the north (south). Only 0.6% of LOWZ tar¬ 
gets are stars and 0.5% are redshift failures. 

Fig. 7 demonstrates that the pipeline is less likely to obtain 
a successful redshift for CMASS targets with fainter ifib2 magni¬ 
tudes. Section 6.3 discusses how we account for this strong depen¬ 
dence in the redshift failure weights. 


5 LARGE SCALE STRUCTURE CATALOGUE 
CREATION 

The creation of the BOSS large-scale structure catalogues involves 
a number of steps. We start with a list of targets based on the target 
selection procedure described above, with the previously known 
redshifts and outcome of the spectral analysis for each object for 
which we have a spectram, matched to this list. Next we constmct 
the survey mask, which specifies the regions of the sky that will be 
included in the LSS catalogues and the completeness in each in¬ 
cluded region. Finally, we use the mask and observed redshifts to 
generate a set of “random” galaxies, Poisson sampling the sky cov¬ 
erage specified by the mask with the same expected density distri¬ 
bution as the galaxies. The random galaxies are assigned redshifts 
to match the distribution of the target sample. Together, the data 
and random catalogues can be used for statistical analyses such as 
A-point functions. These steps and some of the subtleties involved 
are now described in detail. 


5.1 Mask 

We use the MANGLE software (Swanson et al. 2008) to track the 
areas covered by the BOSS survey and the angular completeness of 
each distinct region; our terminology is summarised in Table 1. The 
mask is constructed of spherical polygons, which form the base unit 
for the geometrical decomposition of the sky. The angular mask of 
the survey is formed from the intersection of the imaging bound¬ 
aries (expressed as a set of polygons) and the spectroscopic tiles. 
We define each unique intersection of spectroscopic tiles to be a 
sector (see Blanton et al. 2003; Tegmark et al. 2004; Aihara et al. 
2011 ). 

We compute sector completeness based on the distribution of 
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Property 

NGC 

SGC 

total 

NGC 

SGC 

total 

NGC 

NGC 

Sample 


CMASS 



LOWZ 

LOWZE2 

LOWZE3 

-^gal 

607,357 

228,990 

836,347 

177,336 

132,191 

309,527 

2,985 

11,195 

-^known 

11,449 

1,841 

13,290 

140,444 

13,073 

153,517 

2,730 

6,371 

-^star 

14,556 

8,262 

22,818 

1,043 

976 

2,019 

24 

61 

-^fail 

10,188 

5,157 

15,345 

868 

602 

1,470 

21 

55 

Ncp 

34,151 

11,163 

45,314 

4,459 

4,422 

8,881 

16 

167 

^missed 

7,997 

3,488 

11,485 

10,295 

3,499 

13,794 

114 

609 

-^used 

568,776 

208,426 

777,202 

248,237 

113,525 

361,762 

4,336 

15,380 

Nohs 

632,101 

242,409 

874,510 

179,247 

133,769 

313,016 

3,030 

11,311 

-/Vtarg 

685,698 

258,901 

944,599 

334,445 

154,763 

489,208 

5,890 

18,458 

Total area (deg^) 

7,429 

2,823 

10,252 

6,451 

2,823 

9,274 

144 

834 

Veto area (deg^) 

495 

263 

759 

431 

264 

695 

10 

55 

Used area (deg^) 

6,934 

2,560 

9,493 

6,020 

2,559 

8,579 

134 

779 

Effective area (deg^) 

6,851 

2,525 

9,376 

5,836 

2,501 

8,337 

131 

755 

Targets / deg^ 

98.9 

101.1 

99.5 

55.6 

60.5 

57.0 

43.4 

23.5 


Table 2. Basic parameters of the DR12 CMASS,LOWZ, L0WZE2, and L0WZE3 samples. We track these classifications on a sector-by-sector basis in order 
to compute the BOSS fibre completeness in each sector of the survey. In this table we report Nx = ^sectors the sum over all sectors retained in the 
final BOSS mask. Target classification counts and areas for the LOWZE2 and LOWZE3 samples are reported for chunk 2 and chunk 3-6, respectively. To 
estimate the target density for those samples, we use the full NGC footprint to reduce cosmic variance. 


targets across various outcomes of the tiling pipeline and spectro¬ 
scopic reductions. In each sector (indexed by i) included in the 
large scale structure catalog, we distinguish the following outcomes 
(separately for each target class): 

(i) galaxies with redshifts from good BOSS spectra (we denote 
the number in each sector by A^gai.i), 

(ii) galaxies with redshifts from pre-BOSS spectra (A^known.i), 

(iii) spectroscopically-confirmed stars (Algtar.i), 

(iv) objects with BOSS spectra from which stellar classification 
or redshift determination failed (Alfaii.i), 

(v) objects with no spectra, in a fibre collision group with at least 
one object of the same target class (A^cp.i), ® 

(vi) objects with no spectra, if in a fibre collision group then 
with no other objects from the same target class (A^missed,i)- 

These quantities, summed over all sectors included in the LSS cat¬ 
alogues, are given in Table 2. As each target is classed by one of 
these descriptors, we have that the total number of targets in sector 
i is 

.^targ,i “ (Vstar,i“t“ Algal,i “t“ A^fail,! “t“ A^cp,! “t“ Alniissed,! “t“ Alknown,!, 

(39) 

and we define the number of targets observed by BOSS as 

A^obs,! = Algtar,! + A^gal,! + Alfail,i. (40) 

Matching our analyses for DR9, DRIO and DRl 1, the LOWZ cat¬ 
alogue is then cut to 0.15 < « < 0.43, and the CMASS catalogue 
is cut to 0.43 < 2 < 0.7 to avoid overlap, and to make the samples 
independent. The number of galaxies used in the final catalogue 
Abused is the subset of Agai,: + A^known,! that pass these redshift 
cuts. 

From these descriptions, we define a BOSS fibre completeness 
in sector i 

^ _ _ Nohs,i + Alcp.i _ ,. I, 

Alatar.i + Algal,! + Alfail.i + Alcp.i + Al„,iased,i ■ 

This completeness definition excludes the “known” objects ob¬ 
served by SDSS-II. Cboss,!, shown in Fig. 8, is recorded in the 
mangle mask files released with the LSS catalogues and is used 

® cp is used because each galaxy exists in a “close-pair” with another 


in the random catalogue generation (see Sec. 5.2). By this defini¬ 
tion, the area-weighted average completeness is 99% (97%) for the 
CMASS (LOWZ) samples. We compute the effective mask area in 
Table 2 by weighting the used area of each sector by its complete¬ 
ness. 

The boundaries of the spectroscopic tiles can be seen by eye 
in Fig. 8 as discontinuities in the value of completeness; the unique 
intersection of those tiles define individual sectors, in which we 
treat the BOSS fiber completeness as uniform. On average, the 
completeness is larger in regions covered by more than one spec¬ 
troscopic tile. The raw sky area covered by spectroscopic tiles is 
10338 deg^ of which 10252 deg^ remain (7429 deg^ in the NGC 
and 2823 deg^ in the SGC) after restricting the mask to sec¬ 
tors for which every planned tile has been observed with “good” 
PLATEQUALITY. 

We also define a galaxy redshift completeness, assuming that 
stars are always correctly classified spectroscopically 


Cred.i = 


N, 


gal,i 


Aobs,! — Astar,! 

and define a target completeness 


G, 


targ,i 


Agal,i 4“ Aknown,! 


A, 


targ,i 


(42) 


(43) 


which gives the number of good galaxies spectroscopically ob¬ 
served in BOSS combined with previously known redshifts divided 
by the number of targets calculated in each sector. Fig. 9 shows the 
fraction of the total BOSS area that has target completeness greater 
than a specified value, and how this would change if we coud ig¬ 
nore various effects. This shows the relative importance of different 
categories of targets to the target completness of BOSS, from the 
least important, which is redshift failures, to fibre collisions, which 
is the most important. 

Previous LSS catalogues (DR9, DRIO, DRl 1) had to deal with 
sizeable regions where BOSS spectra were not complete, and we 
made a number of cuts on sectors to include in the LSS catalogues 
to minimise the impact of this effect. In particular, sectors meeting 
any of the following criteria were removed from the LSS mask: 


• Gboss,! < 0.7 (Eqn. 41); removing part-complete sectors on 
the edges of the survey missing a significant fraction of redshifts. 
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Figure 8. Completeness maps for both the LOWZ and CMASS samples in the north and south Galactic caps. The mean completeness is 98.8% for the CMASS 
sample shown in the left panels, and 97.2% for the LOWZ sample in the right-hand panels. Gaps correspond to early chunks as shown in Fig. Al. Each patch 
of different colour comesponds to a plate, with the colour determined by the completeness of that plate. This is suiTounded by the higher completeness regions 
that overlap that plate with other plates. This leaves a pattern that looks like a darker, higher-completeness mesh, covering the survey. 


• Cred.i < 0.8 (Eqii. 42) and A^gai.t > 10; removing regions 
with bad spectroscopic observations. 

• IVobs.i = 0 and there is not another sector within 2 ° in the ± 
right ascension or declination directions; removing isolated regions 
without galaxies. 

But this was not done for the DR12 sample. If we had addi¬ 
tionally applied the fibre completeness cut (first criterion above), 
for DR12 we would have rejected an additional 30 (56) deg^ from 
the CMASS (LOWZ) mask; if instead we had applied the red- 
shift success cut in DR12 (second criterion above), we would have 
rejected an additional 1.7 (1.4) deg^ from the CMASS (LOWZ) 
mask. The difference between the earlier mask selection and the 
algorithm described above applied to DR 12 constitute negligible 
changes on the survey mask. The two algorithms agree to within 
0.3% of the total mask area for both the CMASS and LOWZ sam¬ 
ples. Finally, the classification of A^cp.i and Wmiased.i has slightly 
changed in DR12 relative to DR9-DR11; see Sec. 6.1. 


5.1.1 Veto Masks 

While the basic geometry of the survey is encapsulated in the sur¬ 
vey mask described in the previous sections, there remain many 
small regions within it where we could not have observed galax¬ 
ies. Although they are individually small, they are not randomly 
distributed across the sky, and sum to a significant area, and so we 
exclude them from any analysis. We represent those regions by a set 
of veto masks, and remove “randoms” that fall within these masks. 
The masks are: 

• Centerpost mask: Each Sloan plate is secured to the focal 
plane by a central bolt: no targets coinciding with the centerpost 


of a spectroscopic tile can be observed. This mask reduces the sur¬ 
vey area by 0.04%. 

• Collision priority mask: Ly—a quasar targets receive higher 
priority than BOSS galaxy targets in the tiling algorithm; in re¬ 
gions of only a single spectroscopic tile, BOSS galaxy targets are 
unobservable within a fibre collision radius (62") of those targets. 
Treating the high-priority quasar target locations as uncorrelated 
with the galaxy density field and neglecting any recovered galaxy 
targets in tile overlap regions, we can simply account for the high- 
priority quasars by masking a 62" radius around each. This mask 
reduces the survey area by 1.5%. 

• Bright stars mask: We mask an area around stars in the Ty¬ 
cho catalogue (H0g et al. 2000) with Tycho Bt magnitude within 
[6,11.5] with magnitude-dependent radius 

R = (0.0802Bt - 1.860Bt + 11.625) arcmin. (44) 

This mask reduces the area by 1.9%. 

• Bright objects mask: The standard bright star mask occasion¬ 
ally misses some bright stars that impact the SDSS imaging data 
quality. Additionally, a small number of bright local galaxies satu¬ 
rate the imaging as well, affecting target selection in their outskirts. 
These objects were identified by visual inspection, and the mask 
radii for each object were also determined in this manner, ranging 
from 0.1 ° to 1.5 °. The number of objects in this mask is ~ 125, 
subtending a total area of 43.8 deg^. The list of objects is described 
in section 2.1 of (Rykoff et al. 2014). This mask covers 0.4% of the 
BOSS area. 

• Non-photometric conditions mask: We mask regions where 
the imaging was not photometric in g, r, or i bands, the PSF mod¬ 
elling failed, the imaging reduction pipeline timed out (usually due 
to too many blended objects in a single field, caused by a high stel- 
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completeness 


Figure 9. The fraction of the total survey area that has a target complete¬ 
ness greater than the value shown, where target completeness is defined 
as the number of good galaxies spectroscopically observed in BOSS and 
those with previously known redshifts divided by the number of targets 
calculated in each sector as Ctarg,! = (-^gal,i + -^known,i)/-^targ,i’ 
as in Eq. (43). We compare this completeness with those we would have 
obtained had we not had to include various classes of targets. If there 
had been no stars in our target list, the completeness would have been 
(■^^gal.i + ■^known,i)/(A^targ.i “ -^Vstar.i) (green line). If additionally we 
had not had to deal with fibre collisions, we would have observed a com- 
pleteness (iVgai.i + iVknown.i + Afcp,i)/(Aftarg,i - A''star,i) (blue line), 
and if additionally there were no redshift failures (fVgai,i + iV]jnown,i + 
A^cp,i + Affaii,i)/(Artarg,i — Afstar.i) (black line). From the definition of 
^targ,i in Eq. (39) we see that the remaining decrement of the black line 
from Ctarg,i = 1 is due to missed galaxies Almisaed,i- 


lar density), or the image was identified as having any other critical 
problems. This mask reduces the area by 3.4%. 

• Seeing cut: we discard regions where the point spread function 
full width half maximum (labeled ’PSF_FHWM’ in the catalogues) 
is greater than 2.3, 2.1, 2.0 in the g, r, and i band, respectively. 
The rationale for this cut is to decrease the variation of target den¬ 
sity and properties with seeing due to the star galaxy separation 
(Eqns. 12, 20, and 21) and fflb2 cuts. This cut removes an addi¬ 
tional 0.5% (1.7%) of the NGC (SGC) footprint. 

• Extinction cut: for similar reasons, we also discard areas 
where the E{B — V) extinction (labeled ’EB_MINUS_V’ in the 
catalogues, from Schlegel, Einkbeiner & Davis 1998) exceeds 0.15. 
This cut removes an additional 0.06% (2.2%) of the NGC (SGC) 
footprint. 

In the catalogue creation pipeline, the list of targets is imme¬ 
diately passed through these veto masks, so that targets in vetoed 
regions do not contribute to the sector completeness calculation. 
All random galaxies within the veto regions must also be removed. 
Table 2 shows that in total, 6.6% (9.3%) of the area within the north 
(south) galactic cap footprint was removed by the veto masks. 


5.2 Random Catalogue generation 

All of our clustering analyses make use of random catalogues with 
the same angular and redshift selection functions as the data. To 
produce these catalogues, we first use the MANGLE ransack com¬ 
mand to generate one ~ lOx and two ~ 50x catalogues, where 
the angular density of the random galaxies is proportional to the 
completeness value in the mask for each sector^. As the random 
catalogue follows the redshift completeness per sector, it automat¬ 
ically corrects for any systematic effects caused by the decrease 
in fiducial exposure times starting roughly half-way through the 
BOSS survey. Next we remove random galaxies using the set of 
veto masks described in Sec. 5.1.1. Only the angular coordinates 
of the 10 X random catalogue are used to fit for angular system¬ 
atic weights; see Sec. 6.4. Since the true underlying redshift distri¬ 
bution of our targets is unknown and can only be estimated from 
the empirical redshift distribution, we assign redshifts to the galax¬ 
ies in the two 50 x random catalogues by randomly drawing from 
the measured galaxy redshifts, but with a weight for each galaxy 
given by Wtot.i, defined in Eq. (50). This procedure ensures that 
the (weighted) galaxy and random catalogues have exactly the same 
redshift distribution, apart from (small) stochasticity from the ran¬ 
dom redshift assignment. Ross et al. (2012) compare this random 
redshift assignment scheme with approaches that fit a spline of 
varying knot number to the measured galaxy redshift distribution, 
and then sample from the resulting spline directly. Based on anal¬ 
ysis of mock catalogues, their figure 19 demonstrates that the for¬ 
mer method provides the smallest bias in fits to the monopole and 
quadrupole correlation function. 


6 ACCOUNTING FOR OBSERVATIONAL ARTEFACTS 
IN ESS CATALOGUES 

In this section we describe in detail how we weight the targeted 
galaxies when computing LSS statistics, in order to minimize the 
impact of observational artefacts on our estimate of the true galaxy 
overdensity field. We identify various effects that affect the com¬ 
pleteness of the sample, which we quantify with weights applied 
per sector. These weights are a development of those presented in 
Anderson et al. (2012, 2014b). In particular, we discuss treatment 
of “known” redshifts from SDSS-II that were not re-observed in 
SDSS-III BOSS, galaxies not observed due to fibre collisions, ob¬ 
served galaxies for which a robust redshift was not obtained, and a 
weighting scheme to null non-cosmological fluctuations imprinted 
on the catalogue by the target selection step. The weights described 
below are available for each galaxy in the LSS catalogues. In this 
section we also summarise weights we apply to minimize our sta¬ 
tistical error on the observed power spectrum. 

6.1 Fibre collision corrections 

Galaxies that were not assigned a spectroscopic fibre due to fibre 
collisions are not a random subsample of the full target sample 
since they are within a fibre collision radius (62") of another target. 
This is potentially a large effect: in the SGC, where the coverage 

^ To exactly reproduce the officially released random catalogues, one must 
use the ransack version included in the SDSS idlutils product with version 
v5_4_25 or higher (Surhud More, private communication). Random seeds 
input to ransack ai'e provided in the catalogue generation scripts accompa¬ 
nying MKSAMPLE. 
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of known targets from SDSS-II is lowest, approximately 20% of 
galaxy targets are in a collision group containing other CMASS or 
LOWZ galaxy targets. As a result, 5.8% of CMASS targets and 
3.3% of LOWZ targets were not assigned a spectroscopic fibre. 

These objects preferentially occupy denser environments and 
therefore have higher than average large-scale bias. They are also 
more likely than average to occupy the same dark matter halo as 
a neighbouring galaxy target. Accurate fibre collision corrections 
are therefore particularly important for applications relying on the 
absolute value of galaxy bias (i.e., in a comparison of the lensing 
and clustering amplitude) or those that use small-scale clustering to 
deduce halo occupation statistics and satellite fractions. 

In the default large-scale structure catalogue that focuses on 
obtaining unbiased galaxy density fields on large scales, we sim¬ 
ply upweight the nearest galaxy from the same target class that was 
assigned a fibre to account for collided galaxies that were not as¬ 
signed fibres. This information is tracked by incrementing a weight 
Wcp, labelled WEIGHT_CP in the DR12 LSS catalogues. The up¬ 
weighted nearest neighbour could be classified by the spectroscopic 
pipeline as a good galaxy redshift, a star, or a redshift failure. Up¬ 
weighting the neighbour without reference to its classification is 
the appropriate thing to do as the missed object could be in any of 
these classes. 

We correct 34151 (11163) CMASS targets and 4459 (4422) 
LOWZ targets by nearest neighbour upweighting in the NGC 
(SGC). This amounts to 5.0% (4.3%) of CMASS targets in the 
NGC (SGC), and 1.3% (2.9%) of the LOWZ targets. The differ¬ 
ence between the hemispheres is due both to higher tile density 
in the SGC (so more fibre collisions fall in overlap regions where 
they can be partially resolved) and to most of the previously known 
SDSS-II redshifts falling in the NGC. 

The algorithm used to generate the DR 12 catalogues differs 
slightly from the one used for the DR9-DR11 catalogues. The new 
algorithm uses the output from the tiling algorithm to determine 
membership in fibre collision groups. Targets with the same ’FI- 
NALN’ and ’INGROUP’ field flags output from the tiling code 
share a collision group. We choose the nearest object of the same 
target class and collision group to carry the weight of the unob¬ 
served target. We also allow “known” galaxies to carry the weight 
if they are closer than all BOSS-observed targets. In DR9-DRII 
catalogues, we did not refer to the fibre collision group indices, but 
simply identified collision pairs in the same target class if they were 
separated by less than 62". Nonetheless, the two algorithms select 
the same nearest neighbour ~ 94% of the time. 

Our adopted fibre collision correction scheme neglects a few 
subtle cases: 

• No corrections are applied for objects that are the only mem¬ 
bers of their target class in their fibre collision group, and did not 
receive a fibre. For CMASS, this class represents 4% of all targets 
in fibre collision groups, and 0.7% of all CMASS targets overall. 
Since there are more CMASS targets per unit area, this effect is 
larger for LOWZ targets: 12% of all collided LOWZ targets and 
1.4% of the full sample. Treating such collision pairs as unassoci¬ 
ated is still a good approximation. To verify this assumption, we ex¬ 
amined all collision groups consisting of a single LOWZ target and 
single CMASS target, and for which we obtained both redshifts. 
Only 11 % of such pairs had line-of-sight separations smaller than 
50h~iMpc. 

• No corrections are applied when none of the multiple objects 
of the same target class in a fibre collision group were assigned a 


fibre. These galaxies are treated as random incompleteness in the 
survey coverage and comprise 0.14% of the total galaxy sample. 

• Finally, 0.3% of targets did not receive a fibre due to collisions 
with targets other than CMASS and LOWZ but of the same priority. 
Again we treat these missing redshifts as random. 

Tables 3 and 4 provide statistics about the distribution of CMASS 
and LOWZ galaxies in fibre collision groups and how the proba¬ 
bility of assigning a fibre to a pair of collided galaxies in the same 
fibre collision group depends on the size of the collision group. Ap¬ 
proximately 75% of collided galaxies are in a group of only two, 
and group sizes above four are quite rare. In Table 4, fabre reports 
the fraction of galaxies in a collision group that received a spec¬ 
troscopic fibre, as a function of ntiies, the number of spectroscopic 
tiles covering their sector. In the remaining columns we report the 
fraction of pairs of CMASS-l-LOWZ targets in the same collision 
group for which both targets received a fibre, both globally (Cpair), 
and as a function of rigroup- In regions covered by a single spec¬ 
troscopic tile, only a small fraction of pairs with rigroup = 2 both 
receive a spectroscopic fibre (4%). Such pairs must be sourced from 
collision groups containing at least one target of another class, ori¬ 
ented such that the two CMASS/LOWZ targets in the group are 
separated by more than 62". As expected, for ritiies > 1 pairs in 
smaller collision groups are more likely to be resolved, and the ma¬ 
jority of fibre collisions are removed. 

Finally, to understand the impact of fibre collision corrections 
on our estimate of the true galaxy density field, we examine the 
apparent separation for pairs of galaxies in the same fibre colli¬ 
sion group for which good redshifts were obtained for both. Fig. 10 
shows the distribution near Az « 0, although the tails extend to 
much larger separations. We have converted redshift separations 
to apparent distance separations using the fiducial cosmological 
model. The observed distribution (coloured lines) can be fit by a flat 
background and an exponential distribution centered on Az = 0 
(black lines). The fraction of resolved fibre collision pairs that are 
“correlated” (i.e., contribute to the exponential component in the fit 
to the pairwise separation histogram) is 52% for pairs of CMASS 
targets and 62% for pairs of LOWZ targets, i.e., nearly half of fi¬ 
bre collision pairs are unassociated projections. Interestingly, the 
width of the distribution is consistent with Cexp = 5.4h~^Mpc for 
both target classes and is generally consistent with halo modeling 
expectations. 

Since the choice of which galaxies are assigned fibres in a 
collision group is completely random (apart from maximising the 
number of targets receiving a fibre), the object not assigned a fibre 
is statistically equivalent to the one we upweight, and so once up¬ 
weighted correlations at transverse separations larger than the fibre 
collision scale should be unbiased. However, correlations at trans¬ 
verse separations below the collision scale will be biased, since 
we are removing these small scale pairs. Additionally, these small- 
scale variations will be anisotropic, and therefore likely to have a 
stronger affect on the quadrupole, rather than monopole moments 
of 2-point clustering statistics, for example. We therefore advocate 
constructing statistics that do not apply these weights in situations 
where these effects are important; see Reid et al. (2014) for an ex¬ 
ample configuration space statistic. 

6.2 Treatment of “known” targets 

As the pre-observed “known” sample is complete (no failures are 
kept), it does not match the angular distribution induced by varia¬ 
tions in completeness of the galaxies spectroscopically observed by 
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Table 3. Distribution of galaxies across fibre collision group sizes. The 
largest collisions group (not listed) contains 17 galaxy targets. The first 
column provides the fraction of CMASS targets in groups with rtgroup 
CMASS targets, restricted to groups with at least two CMASS targets. The 
second column shows the same calculation for LOWZ targets. The final col¬ 
umn lists the fibre collision group size distribution, where rigroup includes 
both CMASS and LOWZ targets. For consistency across the mask these re¬ 
sults were computed from the LOWZ sample footprint (chunks ^ 7). For 
reference, the fraction of galaxies that are not in any collision group is 77%. 


^group 

/CMASS 

/lowz 

fc+’L 

2 

0.7631 

0.8456 

0.7566 

3 

0.1687 

0.1182 

0.1726 

4 

0.0440 

0.0270 

0.0461 

5 

0.0146 

0.0070 

0.0150 

6 

0.0059 

0.0010 

0.0057 

7 

0.0023 

0.0005 

0.0022 

8 

0.0007 

0.0003 

0.0008 

9 

0.0003 

0.0006 

0.0003 


Table 4. Fibre collision statistics for targets in regions covered by ntues 
spectroscopic tiles. The second column shows the fraction of the total mask 
area covered by rttues tiles. The third column gives /fibre, the fraction of all 
collided galaxies that were assigned a fibre. The remaining columns specify 
the fraction of pairs of galaxy targets (CMASS ■¥ LOWZ) in the same colli¬ 
sion group for which both targets received a fibre, both globally (Cpair), and 
as a function of rigroup ■ We use the global fraction Cpair to remove collided 
pairs and approximate the fibre collision effect in our mock galaxy cata¬ 
logues. We track Cp^ir separately for the NGC and SGC and for CMASS, 
LOWZ or combined catalogues, but in practice the values are similar in 
each case to those reported here. 


BOSS. Rather than try to model the distribution of known galaxies, 
we instead subsampled these data to match BOSS completeness in 
each sector, thus imposing the BOSS mask on the known galax¬ 
ies. In this way we make the sample indistinguishable from BOSS- 
observed targets. In earlier data releases (DR9-11) we also marked 
a fraction of the galaxies in a 62" close pair containing at least one 
object from the “known” sample as fibre-collided; we did not apply 
this step in our DR12 analysis and describe the difference in more 
detail in the next section. 

In DR9-DRII catalogues we additionally marked a fraction of 
the galaxies in a 62" close pair containing at least one object from 
the “known” sample as fibre-collided, and assigned its weight to 
its nearest neighbour. This fraction was determined by measuring 
the fraction of 62" BOSS targets that were fibre collision corrected 
in each sector. In sectors covered only by a single spectroscopic 
tile all 62" pairs were collided. The original motivation of this cor¬ 
rection was to impose the same fibre collision completeness on the 
“known” targets as the BOSS targets. In DRI2 we did not apply this 
correction. The rationale was that on sufficiently large scales the 
nearest neighbour upweighting scheme restores the correct cluster¬ 
ing statistics, and so should therefore be equivalent to using the 
measured redshifts. However, we expect the effective shot noise to 
be larger when using the former procedure. Correlation function 
and power spectrum analyses that marginalize over a shot noise 
term should be unaffected by this choice; analyses of smaller-scale 
clustering should examine this issue further. This change is partic¬ 
ularly important for clustering of the LOWZ sample because of the 
large overlap with the “known” galaxy sample. 


^tiles 

/area 

/fibre 

<^pair 

Cpair (2) 

Cpair (3) 

Cpair (4) 

1 

0.54 

0.561 

0.092 

0.042 

0.159 

0.142 

2 

0.41 

0.945 

0.820 

0.971 

0.685 

0.589 

3 

0.05 

0.992 

0.966 

0.992 

0.985 

0.915 

4 

0.0005 

1.000 

1.000 

1.000 

1.000 

- 




Az [/j ' Mpc] Az [/j ' Mpc] 


Figure 10. The probability distribution of apparent line-of-sight separations 
for pairs of galaxies in the same fibre collision group and for which both 
have good redshifts. The left panel uses pairs of CMASS targets and the 
right panel uses pairs of LOWZ targets. Both distributions can be fit with the 
sum of a background term and an exponential: + 6 in the range 

|Az| < 50h-iMpc. A total of 52% (62%) of the CMASS (LOWZ) pairs 
contribute to the exponential term. The best fit width a of the exponential 
component is 5.4h“^Mpc for both CMASS and LOWZ targets. 


6.3 Redshift Failures 

For 1.8% (0.5%) of CMASS (LOWZ) targets, the spectroscopic 
pipeline fails to obtain a robust redshift. We do not necessarily ex¬ 
pect these to be distributed randomly with respect to e.g., plate cen¬ 
ter or redshift, and so we again adopt a nearest neighbour upweight¬ 
ing scheme to account for these objects. Redshift failure galax¬ 
ies were permitted to be upweighted because of a nearest neigh¬ 
bour fibre collision. We therefore transfer the total weight to the 
nearest neighbour of the redshift failure, incrementing a weight 
Wnoz, labelled WEIGHTJSIOZ in the DR12 LSS catalogues. The 
upweighted object must be classified either as a good galaxy or star 
redshift. 

In DR9-DR11 large-scale structure catalogues we removed 
sectors with redshift success rates below 80% and at least ten good 
redshifts; in our DR 12 catalogue we exclude troublesome observa¬ 
tions by restricting mask regions with PLATEQUALITY of ’good’, 
and do not remove the handful of sectors that would have been 
excluded using the DR9-11 criteria. Upon closer examination, we 
found that sectors failing the DR9-11 cut contained a small num¬ 
ber of targets and therefore subject to small number statistics; we 
checked that targets in those sectors were drawn from plates with 
high redshift success rates. 

In DR9-II we searched for redshift failure neighbours to up¬ 
weight only in the same sector; in the DR 12 catalogue we only 
consider neighbours observed on the same plate (which spans mul¬ 
tiple sectors) and same date, which restricts the neighbour search 
to galaxies observed under approximately the same conditions, and 
means the weighted number of classified objects in each sector 
matches the number of targets. The majority of close neighbours re¬ 
stricted to the same sector vs. restricted to the same plate and date 
are the same neighbour. The median angular separation between 
galaxies without a good redshift and their closest neighbour using 
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the updated algorithm is 3.7' (3.9') in the north (south), compared 
with 2.9' using the sector-based algorithm. Total counts of redshift 
failures for CMASS and LOWZ galaxies are listed in Table 2. 

In CMASS, redshift failures are more likely to occur on faint 
targets - see Fig. 7. In the weighting scheme described above the 
neighbouring, up-weighted, galaxies are drawn from the distribu¬ 
tion of observed galaxies, which in turn are brighter on average than 
the galaxies that failed to yield a good redshift. Given the slight 
correlation of iflb2 with redshift, this introduces a small redshift- 
dependent bias on the LSS catalogues. To ameliorate this effect, 
we modify the redshift-failure weights such that the weighted dis¬ 
tribution of iflb2 of the corrective weight matches the iflb2 of the 
targets with failed redshifts. In practice, acknowledging that an 
up-weighted galaxy might be a neighbour to more than one red¬ 
shift failure, we compute w„oz,new = 1 -h {Wnoz,oid - l)wi/ii,2, 
where Wifib2 = n{ifib2,noz)/n{ifib2,cp) with n{ifib2,noz) 
and n{ifib2, cp) corresponding to the green and red lines of Fig. 7 
respectively. To avoid Wifib 2 being dominated by Poisson noise 
in any given bin of ifib 2 , we set Wifib 2 = 1 for any bin where 
n{i fib2, noz) or n{i jib2 , cp) are less than ten. The weights are nor¬ 
malised such that "^2 Wnoz,new = X] Wnoz,oid- This Scheme effec¬ 
tively transfers weight from bright to faint neighbours of redshift- 
failure weights. We only apply this extra correction to CMASS for 
two reasons: firstly the LOWZ redshift-failure rate is very small 
(0.5%) and, secondly, we find no significant dependence of red¬ 
shift failure with iahz for LOWZ targets. 


6.4 Angular Systematic Weights 

For the DR 12 data we follow the same approach as described in 
Ross et al. (2012) and updated in Anderson et al. (2014b) to re¬ 
move non-cosmological fluctuations in CMASS target density with 
stellar density and seeing. The LOWZ targets are brighter and do 
not show significant variations with these quantities, so LOWZ tar¬ 
gets do not require these weights. 

In DR12 we update the riside = 128 HEALPix® stellar density 
map to include all stars with i-band magnitudes between 17.5 and 
19.9; the map used in DRlO/DRll did not impose the 17.5 bright 
cut. The two maps also differ by a factor of the pixel area, 0.210 
deg“^. The functional form for Wstar was also updated in DR12 to 
be the inverse of a linear relation: 

tUstar (rts, (fib2) — ’ (45) 

while in DRlO/DRll Watar was linearly dependent on ns; see 
Ross et al. (2015) for details. These two differences explain the 
changes to the values of the Aqjjjj and parameters be¬ 

tween DRlO/DRll and DR12. The DR12 parameter values for 
Wstar, determined using all galaxies in the CMASS catalog with 
0.35 < 2 < 1.0, are = [0.959,0.994,1.038,1.087,1.120] 
and = [0.826, 0.149, -0.782, -1.83, -2.52] x lO"'', com¬ 
puted in computed in 0.3 magnitude width iflb2 bins centred at 
[20.45, 20.75, 21.05, 21.35], as in Anderson et al. (2014b). The pa¬ 
rameter Wstar is determined for each galaxy by first linearly inter¬ 
polating the 2lifi(,2 and fits to derive a value at each galaxy’s 

*flb 2 , and then using Eq. 45. The distribution of weight values is 
similar in the NGC and SGC and, overall, 93% of CMASS galax¬ 
ies have 0.95 < Wstar < 1.1. 

For DRlO/DRll analyses, a map of the DR8 i-band seeing. 
Si, was created by taking the mean seeing value within HEALPix 


http://healpix.jpl.nasa.gov/ 


pixels with n-aide = 1024 over the primary SDSS galaxies in the 
DR8 Catalogue Archive Server. For DR 12, we instead directly 
query the imaging data to determine the conditions estimated for 
each galaxy’s parent imaging field. Per-object and per-field seeing 
estimates are calculated differently. Empirically, these two meth¬ 
ods for determining Si differ by a factor of ~ 0.9. There is also 
scatter between per-field and per-object estimates of sky flux and 
airmass. The DR12 galaxy and random catalogues contain fields 
for ’PSE_FWHM’, ’AIRMASS’, ’SKYFLUX’, ’EB_MINUS_V’, 
and ’IMAGEJDEPTH’ if users want to further explore system- 
atics relationships. In what follows, the i-band seeing Si = 
PSF_FWHM[3]. ForDR12 we adopted a slightly different param¬ 
eter convention from that of earlier catalogues®: 


msee(*5') — A 


-1 

see 


1 — erf 



-1 


(46) 


In addition, we fit the systematic relationship separately for the 
NGC and SGC, again restricting the fits to objects in the CMASS 
LSS catalogues with 0.35 < a < 1.0. The DR12 parameter values 
are Agee = 0.5205 (0.5344), Base = 2.844 (2.267), and (Taee = 
1.236 (0.906) for the NGC and SGC, respectively. In DRlO/DRll 
we also set Wsee{Si > 2".5) = Wsee{Si = 2".5); this action is no 
longer necessary since the DR12 veto masks remove all area with 
Si > 2".0. 

Finally, the application of the CMASS z-band star/galaxy sep¬ 
aration cut in the LOWZE3 sample induced a significant depen¬ 
dence on the sample number density with Si that varies with the 
i-band model magnitude; see Ross et al. (2015) for details. The 
systematic weight for this sample is 

£ = max (— 2,fe -I-m(imod — 16.)“°'®) (47) 

Waee.LOWZES = min (5, (1.-f (Bi - 1.25)^)“^) (48) 

with parameters b = 0.875 and m = —2.226, fit using all objects 
in the LOWZE3 catalogue with 0.2 < 2 < 0.5, including objects 
in chunks 6 in addition to the LOWZE3 targeted region, chunks 
3-6. 

The total angular systematic weights are simply the product of 
Wstar and Waee for each object with index i: 


^systotji — ^star,i^see,i 


(49) 


6.5 Total Galaxy Weights 

Finally, we combine the angular systematics weight Wsystot.i with 
the fibre collision and redshift failure nearest neighbour weights to 
produce a final weight for each object i in the final catalog: 

Wtot,i — Wsystot.i(Wcp,i “t“ Wnoz,i 1). (50) 

Since the default values of both Wcp.i and Wnoz.i are 1, the term 
in parentheses conserves the total number of galaxy targets. This 
is the galaxy weighting consistent with the construction of the LSS 
catalogues provided, and must be used to obtain unbiased estimates 
of the galaxy density field, since this weight is used when assigning 
the random galaxy redshifts; see Sec. 5.2. 


® Eq. 20 of Anderson et al. (2014b) should state Wsee(S) = 

2 Ajj,e 1^1 — erf ^ ^ J for the parameter values listed in that 

text. 
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6.6 Angular Density and Redshift Distribution 


We estimate the angular density of galaxy targets as the total num¬ 
ber of targets within the final LSS mask divided by the total non- 
vetoed area within the sample LSS mask. The values for each target 
class are listed in the final line of Table We convert this angu¬ 
lar target density into a three-dimensional space density through a 
properly normalised redshift probability distribution: 

p(zj, Zj + dz)dz (X ^ wtot,i/tutot.i, (51) 

^iSlzj,Zj+dz] 


where we sum over all objects in the catalogue with good spectro¬ 
scopic redshifts, and Wtot,i is the total weight assigned to target i 
to account for various observational artefacts (Eq. 50). The inclu¬ 
sion of Wsystot.i in the estimate for p(z) accounts for any impact 
of the angular systematics on the (normalised) redshift distribution, 
through e.g., the ifib2 dependence of the stellar weights. However, 
our estimator for the angular target density does not recover the 
true target density in the absence of stars and imperfect seeing, 
but an average target density over the survey footprint. Finally, we 
use the fiducial cosmology to determine the number of targets per 
h~®Mpc^. The result is shown in Fig. 11 for all four target classes, 
as well as the sum of the CMASS and LOWZ sample number densi¬ 
ties (with duplicate CMASS and LOWZ targets counted only once). 
The CMASS-l-LOWZ number density reaches a local minimum in 
the overlap region of n(z = 0.41) = 2.2 x 10“"^ h'^Mpc®. As 
reported in the previous sections, survey incompleteness, fibre col¬ 
lisions, redshift failures, and stars in the target sample all reduce 
the average angular density of good galaxy redshifts compared to 
the angular target density; their aggregate impact is a 10% (4.4%) 
reduction for CMASS (LOWZ). Finally, we compute the effective 
volume Hefi, which quantifies the reach of a sample for making 
cosmological measurements, for the CMASS and LOWZ samples 
following the same algorithm outlined in Anderson et al. (2014b), 
summing over 200 redshift shells 



n(zi)Fo 
1 + n(zi)Po 


^V(zi), 


(52) 


where AV(zi) is the volume of the shell at Zi, and we assume that 
Po = 10 000/i“®Mpc^, which we have changed since DRll, so 
the numbers are not directly comparable to Anderson et al. (2014b). 
We find T4ff = 5.1 Gpc® for CMASS and 2.3 Gpc® for LOWZ. 


6.7 FKP weights 

Feldman, Kaiser & Peacock (1994), hereafter FKP, showed that 
the optimal weighting of galaxies as a function of redshift de¬ 
pends on the number density of galaxy tracers. The optimal weight 
WPKP depends on the amplitude of the power spectrum in the 
power spectrum bin of interest. In practice, we use the same value 
Po = 10000 h“^Mpc^ to estimate both the power spectrum and 
correlation function on all scales. This value of Pq corresponds 
to the observed power spectrum at fc « 0.15hMpc“^. The field 


Our calculation of the ’NEAR’ field in the galaxy and random cata¬ 
logues estimates the angular density of the sample as A~^ + 

Wnoz — 1). Here Agg is the completeness weighted area inside the mask 
and the sum is over all galaxies in the catalogue with good redshifts. This 
method is slightly noisier since the completeness in each region is estimated 
from a finite number of galaxies. We verified that the two methods agree to 
within 0.02%. 



Z 


Figure 11. Number density of all four target classes assuming our fidu¬ 
cial cosmology with Qm = 0.31, along with the sum of the CMASS and 
LOWZ number densities (black). 


‘WEIGHT_FKP’ in the DR12 galaxy and random catalogues is 
given by 


WpKP.i 


1 

1 -F n{zi)Po 


(53) 


for an object with redshift Zi, where n{zi) is computed by lin¬ 
ear interpolation over bins with Az = 0.005 starting at z — 0. 
The uipKP weight is optional in LSS analyses. To utilize these 
weights in a large scale structure analysis, one must weight both 
data and random objects; the final weight of galaxy i is therefore 
Wtot.iWpKP.i and the final weight of random object j is wpkp.j. If 
one does not use the FKP weights (i.e., as in Reid et al. 2014), con¬ 
sistent weightings of the galaxy and random catalogues are Wtot,i 
and Wj = 1, respectively. 

Earlier data releases adopted a different fiducial cosmology 
and assumed Pq = 20000h“®Mpc® to compute ippkp. Perci- 
val, Verde, & Peacock (2004) updated the analysis of Feldman, 
Kaiser & Peacock (1994) to a weighting scheme that accounts for 
luminosity-dependent clustering; such weights will be presented 
for the BOSS galaxy samples in a forthcoming BOSS team pa¬ 
per. However, because our target selection algorithm is so efficient 
at selecting massive galaxies, the gain provided by luminosity- 
dependent weights is modest for our sample. 


7 COMBINED CATALOGUE CREATION 

For the purpose of providing a maximally contiguous three dimen¬ 
sional density field estimate, in DR12 we provide a new catalogue 
that combines the CMASS sample with the three lower redshift 
samples: LOWZE2 (chunk 2), LOWZE3 (chunks 3-6), and LOWZ 
(chunks 7). See Appendix A for details of the LOWZE2 and 
LOWZE3 samples. A precise geometric description of the sky area 
covered by each sample is provided in mangle mask format, con¬ 
structed such that every sector included in the CMASS mask is 
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included in exactly one of the LOWZE2, LOWZE3, or LOWZ 
footprints. We also construct two additional masks, one includ¬ 
ing the LOWZE2 -l- LOWZ sky coverage and another including the 
LOWZE3 -I- LOWZ sky coverage. 

Using those masks, we first generate a LOWZE2 catalogue 
including chunk 2 and chunks 7 and a LOWZE3 catalogue 
including chunks ^ 3 using the target selection algorithms de¬ 
tailed in Appendix A. This is possible since all the galaxies passing 
LOWZE2 and LOWZE3 cuts will also pass the LOWZ cuts. Pro¬ 
ducing a catalogue across a larger fraction of the sky allows a more 
accurate estimate of n{z) for the LOWZE2 and LOWZE3 samples 
(and therefore a better means of assigning redshifts to the random 
galaxy sample). Without this step, the average density in chunk2 
and chunks 3-6 would be poorly determined and could lead to erro¬ 
neous reconstruction flows towards or away from those regions in 
the final combined catalogues. As discussed in Sec. 6.4 and Ross et 
al. (2015), there is a significant correlation between i-band seeing 
and LOWZE3 target density which we remove using a systematic 
weight given by Eq. (48); LOWZE2 and LOWZ samples require no 
systematic weight corrections. We follow this same procedure with 
some minor hut important differences when combining CMASS 
and LOWZ catalogues. After full footprint data and random cat¬ 
alogues are produced, we trim each catalogue back to its original 
targeted region (i.e., LOWZE2 in chunk 2, LOWZE3 in chunks 3- 
6 , and LOWZ in chunks 7) using the mutually exclusive masks 
discussed above. 

Our algorithm to generate the combined catalogue from the 
four different samples (CMASS, LOWZ, LOWZE2, LOWZE3) is 
as follows: 

• Renormalize the CMASS galaxy systematic weights 

tUsystotji 0^ tUsystotji such that 

1 tUsystot, i('tCcp,i -f tUnoz.i 1) 

E^Kp.i+mnoz.i-l) ■ ^ ^ 

This ensures that in the combined catalog, a CMASS target and a 
LOWZ target on average have equal weight in each of the three dis¬ 
tinct regions. The functional form chosen for tUsee and Wstar does 
not guarantee this normalisation. Fibre collision and redshift failure 
weights are left the same as in the original CMASS-only catalogue 
and the parameters for the systematic weights are identical to the 
ones in the CMASS-only catalogue (apart from the renormalisa¬ 
tion). 

• For each of LOWZ, LOWZE2, LOWZE3 samples 
(“LOWZX”), read in the targets (including those in chunks 

7), and remove objects already in the CMASS catalog. Dupli¬ 
cate targets are 2.6%, 2.4%, and 4.4% of the LOWZ, LOWZE2, 
and LOWZE3 samples, respectively. Fibre collision and redshift 
failure weights are then recomputed on each duplicate excluded 
LOWZX sample. As in the previous catalogues, fibre collision 
and redshift failure weights are only assigned to other LOWZX 
targets (not CMASS targets). For the LOWZE3 sample, sys¬ 
tematic weights are assigned using the same parameters as the 
LOWZE3-only sample, but renormalised as in Eq. 54. 

• Concatenate the CMASS and LOWZX samples and compute 
the completeness of the combined sample in each sector. The rest of 
the catalogue creation steps, i.e., random catalogue generation and 
n estimation, are identical to the algorithms used for the CMASS 
and LOWZ catalogues described previously. 

When analysing the combined catalogue, as well as allowing 
for any evolution in the bias across the sample, one also has to con¬ 
sider the differential bias between LOWZ and CMASS samples. 


Although this is expected to be small due to the relatively benign 
transition from LOWZ to CMASS (Ross et al. 2015), a full explo¬ 
ration of this issue is left for a forthcoming BOSS team paper. 


8 DISCUSSION 

The small statistical errors achievable on cosmological measure¬ 
ments from BOSS data require removal of potential systematic is¬ 
sues to an unprecedented level. Spectroscopic target selection and 
mask creation are key areas where systematic problems can be in¬ 
troduced if care is not taken to fully understand both. In this paper 
we have presented the target selection for the three primary spec¬ 
troscopic galaxy catalogues within BOSS: LOWZ, CMASS and 
Sparse, and for variations on these used for some early data. Each 
sample has different sky coverage and expected redshift distribu¬ 
tion. 

We have also presented the methods used to turn the target 
catalogue and redshift measurement data into galaxy and random 
catalogues, which enable clustering measurements to be quickly 
made, as well as methods to mitigate potential systematics. It may 
be that some analyses are best done without the corrections pro¬ 
vided - for example, it may be cleaner for small-scale clustering 
analyses not to apply the close-pair weights, hut to correct in some 
other manner. 

In addition to a number of improvements over the catalogue 
creation method used for DR9, DRIO and DRll samples we have 
described how we have created a single BOSS catalog, combining 
CMASS and LOWZ samples. This allows us to include some extra 
galaxies, and maximise the effective volume covered hy galaxies 
within BOSS. It also allows us to use a binning scheme in redshift 
different from those of CMASS and LOWZ, optimising our cos¬ 
mological measurements. 

The resulting galaxy and random catalogues, the largest in the 
world, are hosted at http: / / www .sdss.org/drl2/ as well 
as supplemental catalogue and target information. In this final re¬ 
lease, we also provide copies of our source code, MKSAMPLE, to re¬ 
produce the DRIO, DRl 1, and DR12 catalogues. The reader should 
consult the source code directly to resolve any ambiguities in our 
description here. 

Next generation spectroscopic experiments, such as eBOSS 
(Dawson et al. 2015), DESI (Levi et al. 2013), HETDEX (Hill et 
al. 2008), 4MOST (de Jong et al. 2014), WEAVE (Dalton et al. 
2014) PFS (Takada et al. 2014), Euclid (Laureijs et al. 2011) and 
WFIRST (Spergel et al. 2015), are expected to make cosmologi¬ 
cal measurements with precision either comparable or higher by up 
to an order of magnitude compared to that of BOSS, requiring a 
thorough understanding and extremely careful treatment of poten¬ 
tial systematic effects. Although each of these future experiments 
have different observing strategies, they will encounter challenges 
in the process of catalogue creation similar to those of BOSS (e.g. 
variations in the galaxy surface density due to galactic extinction is 
an effect inherent to our observable Universe). The lessons learned 
from the catalogue creation method applied within BOSS, and de¬ 
scribed in this paper, will be of strong benefit for these future sur¬ 
veys. 
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Figure Al. The location of early chunks, where targeting and/or photo¬ 
metric reduction versions differed from the later chunks. Chunk 2 (blue), 
chunks 3-4 (green), chunks 5-6 (red), and chunks 7-11 (black) are shown. 
Chunks 7-11 used an early version of the imaging data reduction software 
(see Section 2). 


APPENDIX A: LOWZ EARLY SELECTION 
ALGORITHMS 

As the survey progressed, these were slight changes to the targeting 
pipeline. In some instances the newer algorithm was stricter than 
the one used in the past, so we simply apply the same cuts to the ob¬ 
jects targeted earlier as well. One special case is the LOWZ targets 
in chunks 2-6. The star-galaxy separation algorithm for CMASS 
was erroneously applied to those galaxies as well, resulting in a 
drastic reduction in the target density. There are other differences, 
and so we define two algorithms, LOWZE2 as that applied to chunk 
2, and LOWZE3 as that applied to chunks 3-6. In analyses thus far 
we have simply eliminated these early regions from our LOWZ 
catalog, but we are actively pursuing a sufficient description of that 
population to robustly recover clustering measurements in those re¬ 
gions. Removing this area results in a 10% reduction in the LOWZ 
survey mask area. The distributions of the early chunks on the sky 
are shown in Fig. Al. Chunk 2 was commissioning data, and used 
the LOWZE2 version; Chunks 3-6 used LOWZE3, and chunks 7- 
11 used older photometric reductions, and a different version of 
RESOLVE (see Section 2.1). Chunk 1 was used for very early com¬ 
missioning runs and is not of sufficient uniformity to be used to 
create LSS catalogues. Chunk 1 is located at DEC=0° in the SGC 
footprint (commonly referred to as “Stripe 82”). This area was later 
reobserved with updated target selection as Chunk 11. 

• Chunk 2: The LOWZE2 sample had slightly different rcmod 
cuts and the CMASS i-band star-galaxy separation cut was erro¬ 
neously applied. The catalogue was later trimmed to 16 < rcmod 
as well. This selection yields a target density ~ 15% lower than the 
nominal LOWZ target sample. 
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• Chunks 3-6: The LOWZE3 sample is the same as chunk 2 but 
with a stricter 17 < rcmod bound and both star-galaxy separation 
cuts. This selection yields a target density ~ 45% lower than the 


nominal LOWZ target sample. 
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• The iflb2 < 21.5 CMASS cut was applied to chunks 15 and 
above. Our CMASS LSS catalogue applies this cut to all chunks. 
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